Back to research

DeWaM: Deconvolution Wavelet Model for Microscopy Image Restoration

Early PhD exploration, 2022

This was an early PhD attempt to make microscopy image restoration more structured by moving part of the problem into a learned wavelet domain. Instead of training only a direct image-to-image U-Net, the project asked whether learnable analysis and synthesis wavelet filters could produce a useful coefficient space for deconvolution. The idea was interpretable and technically interesting, but the learned-wavelet models did not outperform the simpler U-Net baseline.

Motivation

The goal was to restore microscopy images by mapping a degraded or thick input image to a corresponding thin, deconvolved target image. I called this family of models DeWaM, short for Deconvolution Wavelet Model.

The main question was simple: can restoration improve if the model learns a wavelet representation first, rather than doing all prediction directly in pixel space?

The project followed a few stages:

Restoration Task

Each training example contained an input microscopy image and a deconvolved target:

\[ (x_i, y_i), \]

where \(x_i\) denotes the degraded or thick input and \(y_i\) denotes the deconvolved target. The experiments also considered a noisy input condition, written as \(x_i^n\):

\[ x = \begin{cases} x_i, & \text{clean input condition},\\ x_i^n, & \text{noisy input condition}. \end{cases} \]

The restoration objective was to learn a function \(F_{\theta}\) such that:

\[ \hat{y}_i = F_{\theta}(x_i) \approx y_i. \]

Fixed Wavelet Setup

The earlier version used a fixed discrete wavelet transform. The input image was decomposed into wavelet coefficients, a neural network operated on those coefficients, and a fixed inverse wavelet transform reconstructed the output:

\[ C_x = \mathcal{W}_{\mathrm{fixed}}(x), \qquad \tilde{C}_y = N_{\theta}(C_x), \qquad \hat{y} = \mathcal{W}^{-1}_{\mathrm{fixed}}(\tilde{C}_y). \]

The image-level loss was:

\[ \mathcal{L}_{\mathrm{img}} = \frac{1}{|\Omega|} \sum_{p\in\Omega} \left(\hat{y}[p]-y[p]\right)^2. \]

This setup made the representation structured, but the wavelet basis itself was still fixed.

Learning the Wavelet Basis

The next idea was to learn the analysis and synthesis filters instead of using a fixed Haar-like basis. The analysis filters were:

\[ \mathbf{h} = (h_0,h_1), \]

and the synthesis filters were:

\[ \mathbf{g} = (g_0,g_1). \]

The learned forward and inverse transforms can be written as:

\[ C_x = \mathrm{FWT}_{\mathbf{h}}(x) = \begin{bmatrix} C_x^{LL}\\ C_x^{LH}\\ C_x^{HL}\\ C_x^{HH} \end{bmatrix}, \qquad \hat{x} = \mathrm{BWT}_{\mathbf{g}}(C_x). \]

Here the \(LL\) component captures low-frequency structure, while \(LH\), \(HL\), and \(HH\) capture directional high-frequency detail. The hope was that a learned wavelet basis could adapt to microscopy structures better than a hand-chosen transform.

Learnable wavelet/scaling setup. The forward and inverse transforms are trained rather than fixed.

One encouraging observation was that the learned filters produced decompositions that resembled meaningful fixed wavelet filters. This suggested the model was not learning arbitrary filters, but it did not yet prove that the representation would improve restoration.

Outputs from the trained filters. The decomposition retained low-frequency image structure and detail-like coefficient maps.

Supervised Deconvolution Setup

The supervised deconvolution experiments were organized into pretraining and coefficient-prediction stages.

In Step 1, the learned transform pair was trained without the coefficient network:

\[ \hat{y}_{\mathrm{step1}} = \mathrm{BWT}_{\mathbf{g}} \left( \mathrm{FWT}_{\mathbf{h}}(x) \right), \]

with the loss:

\[ \mathcal{L}_{\mathrm{step1}} = \mathrm{MSE}(\hat{y}_{\mathrm{step1}}, y). \]

Step 1 Prime strengthened the pretraining by asking the learned transform pair to reconstruct both the input image and the target image:

\[ C_x = \mathrm{FWT}_{\mathbf{h}}(x), \qquad C_y = \mathrm{FWT}_{\mathbf{h}}(y), \]

\[ \mathcal{L}_{\mathrm{step1}^{\prime}} = \mathrm{MSE}(\mathrm{BWT}_{\mathbf{g}}(C_x), x) + \mathrm{MSE}(\mathrm{BWT}_{\mathbf{g}}(C_y), y). \]

Step 1 Prime pretraining. The learned transform pair is trained on both input and deconvolved target images before the coefficient model is introduced.

Coefficient-Space Model

Step 2 trained a neural network to map input wavelet coefficients to target-like wavelet coefficients:

\[ C_x = \mathrm{FWT}_{\mathbf{h}}(x), \qquad C_y = \mathrm{FWT}_{\mathbf{h}}(y). \]

The coefficient model predicted:

\[ \tilde{C}_y = N_{\theta}(C_x), \]

and the learned inverse transform reconstructed the deconvolved image:

\[ \hat{y} = \mathrm{BWT}_{\mathbf{g}}(\tilde{C}_y). \]

The training loss combined image supervision and coefficient supervision:

\[ \mathcal{L}_{\mathrm{step2}} = \mathrm{MSE}(\hat{y}, y) + \lambda_{\mathrm{coef}} \mathrm{MSE}(\tilde{C}_y, C_y). \]

Step 2 training. The coefficient model predicts target wavelet coefficients, while supervision is applied both in coefficient space and image space.

At test time, the target branch was removed. The model used only the input image:

\[ \hat{y} = \mathrm{BWT}_{\mathbf{g}} \left( N_{\theta} \left( \mathrm{FWT}_{\mathbf{h}}(x) \right) \right). \]

Step 2 inference. Only the upper branch is used: input image, learned wavelet transform, coefficient model, and learned inverse transform.

Baseline and Result

The comparison baseline was a direct U-Net:

\[ \hat{y}_{\mathrm{base}} = U_{\theta}(x), \]

trained with the same image-domain mean squared error objective. This baseline was important because it tested whether the learned wavelet representation actually helped beyond a simpler image-to-image restoration model.

Model Clean PSNR Noisy PSNR
Baseline U-Net 39.43 32.73
Step 2 37.61 28.76
Step 2 Prime 35.05 28.64

The numerical result was clear: the direct U-Net was strongest in both clean and noisy settings. The learned-wavelet models were plausible and structured, but in these runs they did not beat direct image-domain restoration.

Baseline U-Net clean qualitative result
Baseline U-Net, clean input.
Baseline U-Net noisy qualitative result
Baseline U-Net, noisy input.
Step 2 clean qualitative result
Step 2, clean input.
Step 2 noisy qualitative result
Step 2, noisy input.
Step 2 Prime clean qualitative result
Step 2 Prime, clean input.
Step 2 Prime noisy qualitative result
Step 2 Prime, noisy input.

What Did Not Work

The failure mode was not that wavelets were useless. The learned filters did produce meaningful decompositions, and coefficient-space supervision gave a reasonable formulation. The problem was that the full DeWaM pipeline had to learn two things at once: a useful wavelet representation and a coefficient-space deconvolution model.

The U-Net had a simpler job. It could spend all of its capacity on the direct thick-to-thin image mapping. In this experiment, that simpler route won.

The main takeaways were:

So the project did not become a final thesis direction, but it was useful: it tested a clean representation-learning hypothesis and ruled out one tempting route early.