Matched Filters in Spectroscopy, Preston Bohm

The explorer above simulates a weak absorption feature buried in noise on a slowly varying baseline. Crank up the lamp drift to watch the GLS estimator absorb the baseline while the line holds its significance, use the mismatch slider to see template errors bias the amplitude, and toggle between doublet and Lorentzian shapes to see how geometry changes the response.

The problem: weak features on ugly backgrounds

A recurring task in absorption and emission spectroscopy is deciding whether a known feature is present in a spectrum and, if so, how strong. The line shape is known from laboratory reference data, a spectral template, but the feature is weak: a few milli-absorbance units against comparable detector noise. Worse, real spectra never sit on a flat baseline. Lamp intensity drifts between reference and sample, detector response varies with wavelength, and stray light and unresolved broadband absorbers add baseline curvature, all nuisance backgrounds typically far larger than the signal. This is the everyday situation in differential optical absorption spectroscopy (DOAS) [6] and in any instrument hunting for small lines.

Write the measured spectrum (in absorbance, the negative log of transmission against a reference) as a vector over N wavelength channels:

\[ \mathbf{x} \;=\; \alpha\,\mathbf{g} \;+\; \mathbf{B}\boldsymbol{\beta} \;+\; \mathbf{n} \]

Here \( \mathbf{g} \in \mathbb{R}^N \) is the template (the expected absorbance shape, unit peak depth) and \( \alpha \) the unknown amplitude, proportional to column density or concentration via Beer–Lambert. The matrix \( \mathbf{B} \in \mathbb{R}^{N \times m} \) collects the nuisance terms: a column of ones for the offset, a slope, quadratic curvature, and low-order shapes (polynomials or splines) for lamp drift or detector response. The vector \( \mathbf{n} \) is detector noise with covariance \( \mathbf{C} = \mathbb{E}[\mathbf{n}\mathbf{n}^\top] \), covering shot noise, read noise, and channel-to-channel correlations from the readout or prior smoothing.

The matched filter as a noise-weighted projection

Forget the baseline for a moment (\( \boldsymbol{\beta} = 0 \)) and ask for the best linear estimate of \( \alpha \). Generalized least squares gives the minimum-variance unbiased answer [3], [5]:

\[ \hat{\alpha} \;=\; \frac{\mathbf{g}^\top \mathbf{C}^{-1} \mathbf{x}}{\mathbf{g}^\top \mathbf{C}^{-1} \mathbf{g}}, \qquad \operatorname{var}(\hat{\alpha}) \;=\; \frac{1}{\mathbf{g}^\top \mathbf{C}^{-1} \mathbf{g}} \]

This is the matched filter in honest form: project the data onto the template, weighting every channel by the inverse noise covariance. Low-noise channels count more, and correlated noise patterns are rotated away before the projection. The classical result that the optimal detector is a "time-reversed conjugate copy of the signal" [1], [4] is exactly this inner product written as a convolution.

The textbook simplification

\[ \hat{\alpha} \;=\; \frac{\mathbf{g}^\top \mathbf{x}}{\mathbf{g}^\top \mathbf{g}} \]

drops \( \mathbf{C}^{-1} \), legitimate only when the noise is white with equal variance in every channel, \( \mathbf{C} = \sigma^2 \mathbf{I} \), so the weighting cancels. For a real spectrometer, where shot noise scales with signal level, hot pixels exist, and the readout correlates neighboring channels, the plain dot product is no longer optimal and its quoted uncertainty is wrong. It stays unbiased for the signal but blind to a bigger problem: the baseline.

Nuisance backgrounds and generalized least squares

The naive dot product treats everything in \( \mathbf{x} \) overlapping the template as signal. A sloped or curved baseline overlaps every template at some level, so lamp drift leaks into \( \hat{\alpha} \) as bias, visible in the explorer as structured residuals when you raise the drift slider. The fix is not to subtract a hand-drawn baseline first but to estimate signal and baseline jointly. Stack the template and nuisance columns into one design matrix \( \mathbf{A} = [\,\mathbf{g} \;\; \mathbf{B}\,] \) and solve the generalized least-squares problem [3]:

\[ \begin{bmatrix} \hat{\alpha} \\ \hat{\boldsymbol{\beta}} \end{bmatrix} \;=\; \left( \mathbf{A}^\top \mathbf{C}^{-1} \mathbf{A} \right)^{-1} \mathbf{A}^\top \mathbf{C}^{-1} \mathbf{x} \]

An equivalent, illuminating form: first project both data and template onto the subspace orthogonal to the baseline (the projected vectors \( \tilde{\mathbf{x}}, \tilde{\mathbf{g}} \)), then matched-filter what is left:

\[ \hat{\alpha} \;=\; \frac{\tilde{\mathbf{g}}^\top \mathbf{C}^{-1} \tilde{\mathbf{x}}}{\tilde{\mathbf{g}}^\top \mathbf{C}^{-1} \tilde{\mathbf{g}}} \]

In words: only the part of the template that cannot be mimicked by offset, slope, curvature, or lamp drift carries information about the line. This is the "differential" in DOAS: broadband structure goes to the polynomial, and only the narrow differential structure of the cross-section is used for quantification [6]. The nuisance model costs a modest variance increase, \( \tilde{\mathbf{g}}^\top \mathbf{C}^{-1} \tilde{\mathbf{g}} \le \mathbf{g}^\top \mathbf{C}^{-1} \mathbf{g} \), since some template energy is sacrificed to the baseline subspace. That trade is almost always worth it.

Detection: the test statistic

Estimation and detection are two views of the same projection. Under the no-signal hypothesis, \( \hat{\alpha} \) is zero-mean Gaussian with the variance above, so

\[ z \;=\; \frac{\hat{\alpha}}{\sigma_{\hat{\alpha}}}, \qquad \sigma_{\hat{\alpha}} = \left( \tilde{\mathbf{g}}^\top \mathbf{C}^{-1} \tilde{\mathbf{g}} \right)^{-1/2} \]

is a standard normal score, and thresholding \( z \) is the generalized likelihood-ratio test for this model [2]. Panel ④ computes \( z \) with the template re-centered at every candidate wavelength \( \lambda_0 \), a matched-filter scan; a genuine line produces a sharp peak at its true position. The expected significance of a line of depth \( \alpha \) is \( \mathbb{E}[z] = \alpha \sqrt{ \tilde{\mathbf{g}}^\top \mathbf{C}^{-1} \tilde{\mathbf{g}} } \): deeper lines, more channels, and quieter detectors all help just as intuition says.

Limitations

Template mismatch. If the assumed line position, width, or shape differs from reality (calibration drift, pressure broadening, line-shape changes), the projection captures only part of the signal and \( \hat{\alpha} \) is biased low (try the mismatch slider); a shift of one line width can cost most of the significance. In practice the template position and stretch are often fit as additional, now nonlinear, parameters.
Correlated or misestimated noise. Using \( \mathbf{C} = \sigma^2\mathbf{I} \) when the noise is actually correlated leaves the estimate unbiased but makes \( \sigma_{\hat{\alpha}} \), and every quoted significance, wrong, usually optimistic. Readout correlations and prior smoothing are the common culprits; residual autocorrelation is the diagnostic.
Calibration drift between reference and sample. The absorbance model assumes the reference spectrum still describes the lamp and detector at measurement time. Slow drift not spanned by \( \mathbf{B} \) appears as structured residuals and biases \( \hat{\alpha} \); re-referencing often or adding drift terms to \( \mathbf{B} \) trades variance for that bias.
Overfitting the baseline. Too flexible a \( \mathbf{B} \), with high-order polynomials or dense splines, lets the baseline absorb genuine line signal, since \( \tilde{\mathbf{g}} \to 0 \) as the nuisance subspace grows toward the template. The symptom is exploding \( \sigma_{\hat{\alpha}} \). Model-selection criteria or physical priors keep \( \mathbf{B} \) honest [7].

References