Detecting Placer Mining from Sentinel-2 Time Series Using a Gaussian-Process Harmonic Baseline

Whitepaper · Placer Mining Detection Pipeline · v3

1. Introduction

1.1 The field

Placer gold mining excavates and reworks riverbed sediments. At 10 m resolution it is visible to public optical satellites — Sentinel-2 in particular — but the resulting spectral disturbance must be separated from a large background of natural variation: seasonal phenology, snow cover, river migration, dust events, and inter-annual climate drift. Most remote-sensing pipelines for surface-mining detection fall into one of three families: (i) thresholded differencing of spectral indices between two dates or two seasonal composites; (ii) supervised classifiers trained on labelled imagery; and (iii) harmonic time-series decompositions (BFAST, CCDC, LandTrendr) that fit a periodic-plus-trend model to a per-pixel index series and flag deviations from it.

1.2 Operating conditions in Mongolia

Two properties of Mongolian river basins constrain which of those families can be applied directly.

Irregular sampling. Sentinel-2 revisits the area on a fixed orbital cadence, but the usable record at each pixel is irregular. The Scene Classification Layer flags clouds, cloud shadows, cirrus, and snow; after masking, each pixel's time series has a multi-month gap every winter and shorter scattered gaps elsewhere. Algorithms that assume regular sampling — including the discrete Fourier transform — cannot be fit on this signal without first imputing values, and any imputation pre-determines what the model will later call an anomaly.

Large-amplitude seasonality. The steppe and its riparian corridors swing through a wide annual cycle: rapid green-up after snowmelt, summer drying, autumn bare-soil exposure, and a long snow-covered winter. The amplitude of these natural cycles is comparable in magnitude to the spectral change produced by mining. A single threshold applied to a date-pair difference therefore confuses seasonal swing for disturbance, and a model that adapts too quickly to recent observations will treat a freshly excavated cell as the "new normal" and stop flagging it.

1.3 Approach taken in this work

Given those two constraints, three design choices structure the rest of the paper.

(a) Periodic harmonic filtering of the seasonal cycle. The seasonal component is the largest source of nuisance variance, and it is approximately one-period-per-year. The natural tool is harmonic regression — a sum of sines and cosines at the annual frequency. Because the raw record is irregularly sampled, ordinary least-squares Fourier fitting is not viable; the Lomb-Scargle periodogram and related variants handle uneven sampling but are not well suited to extracting a smooth baseline at arbitrary query dates. A Gaussian Process with a periodic covariance kernel provides the same harmonic decomposition in a form that natively tolerates gaps and returns a posterior mean at any timestamp.

(b) Constraints on the GP so that it does only what is asked. The GP is restricted to behave as a seasonal filter rather than a general curve-fitter: the periodic component's period is fixed at exactly one year, the trend component's lengthscale is fixed at five years (so the trend can absorb only slow climatic drift, not the step-like rupture caused by excavation), and the mean function is a learned constant rather than a flexible drift. The only quantities free to be learned per pixel are the two scale-kernel outputscales, the noise level, and the constant mean.

(c) Accumulated, multi-index evidence rather than per-date thresholds. Mining is persistent and shows up jointly across multiple physical indices. The pipeline therefore (i) computes standardised residuals for four indices (NDVI, BSI, MNDWI, NDTI), (ii) accumulates each into a leaky CUSUM whose decay rate corresponds to roughly one seasonal half-life, and (iii) fuses the four CUSUM cubes by elementwise minimum, so a pixel must be simultaneously anomalous on all four channels to enter the detection mask.

2. Method

The input to the pipeline is a Sentinel-2 surface-reflectance cube over the area of interest, with one entry per acquisition date and six optical bands (blue, green, red, near-infrared, and two short-wave-infrared channels) plus the per-pixel Scene Classification Layer.

2.1 Cloud and snow masking

Each acquisition's classification layer is read in parallel with the optical bands. Pixels marked as no-data, saturated, cloud shadow, cloud (medium or high probability), cirrus, or snow are set to NaN in the optical bands. NaNs are propagated through the rest of the pipeline, not interpolated, so that gaps are visible to the model as missing observations rather than as smoothed-over guesses.

2.2 Spectral indices

Four normalised difference indices are computed from the masked bands and carried through the rest of the pipeline:

NDVI = (B08 − B04) / (B08 + B04) — vegetation cover.
BSI = ((B11 + B04) − (B08 + B02)) / ((B11 + B04) + (B08 + B02)) — bare-soil exposure.
MNDWI = (B03 − B11) / (B03 + B11) — open and turbid water.
NDTI = (B04 − B03) / (B04 + B03) — turbidity / sediment load.

They were chosen because mining produces a joint signature across the four (vegetation loss + bare soil + turbid ponds + sediment-laden runoff), whereas natural disturbances (floods, droughts, snow) excite the four in different combinations.

2.3 Gaussian Process as a gap-tolerant harmonic filter

The seasonal baseline at each pixel is modelled as a Gaussian Process whose covariance is a sum of two scaled components:

$$ k(t, t') = \sigma_{\text{per}}^2 \cdot k_{\text{periodic}}(t, t';\, p = 1\,\text{yr}) + \sigma_{\text{tr}}^2 \cdot k_{\text{RBF}}(t, t';\, \ell = 5\,\text{yr}) $$

with t measured in years from the start of the record. Three hyperparameters are fixed, not learned:

Hyperparameter	Value	Role
Periodic period p	1 year	Locks the periodic component to exactly one cycle per year.
Trend lengthscale ℓ	5 years	Forces the trend component to vary only on multi-year scales.
Mean function	constant	A single learned constant per pixel — no flexible drift in the mean.

The intent of these constraints, taken together, is that the GP can capture only two kinds of structure: an annual sinusoidal-like cycle and slow multi-year climatic drift. Anything else — including a step change caused by excavation — is forced into the residual, where it can be scored. If the trend lengthscale were small, or the mean function flexible, the GP would adapt to a freshly excavated pixel within a season or two and the disturbance would be absorbed into the baseline.

The only parameters free to be learned per pixel are the two outputscales (σ²_per, σ²_tr), the observation-noise variance, and the constant mean.

2.4 Computational optimisation

Exact Gaussian-Process inference is cubic in the number of training points, and the area of interest contains on the order of 310 000 pixels. The implementation exploits the fact that the number of valid (non-cloud, non-snow) observations differs across pixels but the linear-algebra structure is identical for any two pixels with the same valid count: pixels are grouped by their valid-observation count, each group is fitted in parallel on the GPU as a single batched problem of up to 2 048 pixels at a time, and posterior means and variances are evaluated in chunks of 50 dates. This bounds GPU memory to fit consumer-grade hardware and is purely a runtime optimisation — it does not change the mathematical model in §2.3.

2.5 Anomaly score: residual standardisation

The standardised anomaly score for each index, date, and pixel is

$$ z(t,x,y) = \mathrm{direction} \cdot \frac{\mathrm{raw}(t,x,y) - \mu_{\mathrm{GP}}(t,x,y)}{\sqrt{\sigma_{\mathrm{emp}}^2(x,y) + \sigma_{\mathrm{GP}}^2(t,x,y) + \varepsilon}} $$

where:

σ²_emp(x, y) is the empirical residual variance at each pixel, computed from the training-window residuals only (cloud-masked timesteps excluded);
σ²_GP(t, x, y) is the GP's predictive variance at the query date, which grows in periods that are weakly constrained by nearby observations;
the two are combined in quadrature so that both sources of expected variation enter the denominator;
ε is a small constant that prevents division by zero when a pixel happens to have zero variance;
the sign convention direction = −1 is used for NDVI (since the diagnostic for vegetation loss is a negative residual) and +1 for the bare-soil, water, and turbidity indices.

The effect of the empirical term is that each pixel is judged against its own historical envelope: a pixel whose pre-2021 residuals were noisy (an active river channel, for example) has a large σ²_emp and produces small z-scores for the same physical change that would produce a large z-score on a previously stable pixel. The GP-variance term prevents the score from blowing up at dates where the GP itself is poorly constrained — for example, immediately after a long winter gap.

2.6 Temporal accumulation and multi-index fusion

The per-date z-score is the evidence on a single observation. To convert it into evidence of persistent disturbance, a leaky cumulative sum (CUSUM) is applied independently to each index:

$$ C_i = \max\!\left(0,\; \mathrm{decay}^{\Delta d_i} \cdot C_{i-1} + z_i\right) $$

with a per-day decay of 0.99432, giving a half-life of ln 2 / ln(1/0.99432) ≈ 121 days — roughly one season. The accumulator is clamped at zero so that brief opposite-sign excursions do not produce negative debt that would mask a later real anomaly, and Δd_i is the actual number of days between consecutive valid timestamps (so the boundary between training and inference windows is bridged with the true gap, not reset to zero). Cloud-masked timesteps contribute nothing to the sum, but the decay still applies, so an extended invalid stretch causes the accumulator to fade rather than freeze.

The four index-specific CUSUM cubes are then fused at each timestep by elementwise minimum:

$$ C_{\mathrm{fused}}(t,x,y) = \min\{C_{\mathrm{NDVI}},\, C_{\mathrm{BSI}},\, C_{\mathrm{MNDWI}},\, C_{\mathrm{NDTI}}\}(t,x,y) $$

and a pixel is reported as anomalous on date t if and only if C_fused > 6. Because the minimum is taken pointwise across indices, no single channel can carry the score above threshold on its own; agreement across all four physical indices is required.

2.7 Detection output

At each timestamp in the record, the binary mask of pixels with fused score above the threshold is vectorised into polygons and reprojected to geographic coordinates, producing one set of detection polygons per acquisition date. The full series of per-date polygons is what the demo front-end renders on the map.

3. Results

Quantitative numbers (counts of detection dates, total flagged area, and timing of disturbance onsets) are to be filled in once the validation pass against the digitised reference is complete.

3.1 Baseline fits at the reference points

The fitted baseline was inspected at seven hand-chosen reference points covering the main land-cover and disturbance classes in the area: a stretch of natural river, a forested patch, an undisturbed flat-grass patch, two recent mining sites (dry pit and water-filled pit), and two older mining sites that were already disturbed before the training cut-off. Three distinct patterns are observed across these classes:

At new mining points (disturbed after the training cut-off): the baseline, trained only on pre-2021 observations, continues to predict the pre-disturbance seasonal envelope into 2021 and beyond. The post-2021 record diverges sharply from that envelope, and detections appear only after the cut-off — there is no spurious detection within the training window itself.
At old mining points (already disturbed within the training window): both the pre-disturbance and post-disturbance behaviour fall inside the training record, so the baseline is fitted to a history that already contains the change. The fitted envelope therefore absorbs the historical disturbance into the model, and the pixel does not light up as a new detection — consistent with the intended behaviour, which is to flag new ruptures of the per-pixel baseline rather than to re-detect historical activity that happened before observation began.
At control points (forest, flat grass, undisturbed river): the fit tracks the seasonal cycle smoothly through the winter gaps, and no detections are produced over either window.

Baseline fit at a new-mining reference point — **Figure 3.1a — New mining (post-training-cut-off).** The pre-2021 baseline continues into 2021+ as if the disturbance had not occurred; the test-window observations diverge from it sharply.

Baseline fit at an old-mining reference point — **Figure 3.1b — Old mining (already disturbed within the training window).** The pre-disturbance and post-disturbance signatures both sit inside the training record, so the baseline absorbs the historical change and the pixel is not re-flagged as new.

Baseline fit at a forest control point — **Figure 3.1c — Control (forest).** The fit tracks the seasonal cycle through every winter gap; the test-window observations sit inside the envelope and no detection is produced.

3.2 Empirical-variance map and where false candidates do (and do not) survive

The two-dimensional map of σ̂_emp over the area of interest is brightest along the active river channel and its floodplain and dim over the surrounding steppe — the geometry expected from §2.5. The interesting consequence is in the final detection layer rather than in the variance map itself. Along the river shore, the per-date z-score does occasionally rise, because river meanders shift and vehicle tracks appear along access roads. But at the 10 m pixel scale, the natural curling of the channel is a small geometric change relative to σ̂_emp, and the river-shore z-scores rarely accumulate to the fused threshold. Detections that do appear in the riparian band are concentrated at points consistent with ground disturbance — excavation pits and access tracks — rather than tracking the river edge itself.

Per-pixel empirical RMSE heatmaps across the four indices — **Figure 3.2 — Empirical residual variance σ̂_emp across the area of interest.** Channels and floodplains are bright (historically restless); surrounding steppe is dim (historically stable). The normalisation in §2.5 down-weights anomalies that occur where anomalies have always been allowed and up-weights them where they have not.

3.3 Phase-space separability of the four indices

The 3D scatter of (C_NDVI, C_MNDWI, C_BSI) — the per-index leaky CUSUM trajectories defined in §2.6 — coloured by the reference-point classes shows that mining pixels occupy a region of phase space distinct from natural river and forest pixels: large C_NDVI co-occurring with large C_BSI and elevated C_MNDWI. Natural river pixels lie along a different axis (elevated C_MNDWI without the BSI signature). This separation is the empirical justification for the strict-AND fusion in §2.6.

3D phase-space scatter of per-day leaky CUSUM scores — **Figure 3.3 — Phase-space separability.** Per-day leaky CUSUM scores plotted as a 3-D scatter, coloured by reference-point class. Mining and natural-river populations occupy disjoint regions; the strict-AND fusion exploits the fact that mining requires simultaneous excitation across all four indices, which natural disturbance does not produce.

3.4 Detection series

The pipeline produces one set of detection polygons per acquisition date across the full inference window. Reported per year: the number of dates carrying at least one detection polygon, the total area flagged, and the spatial concentration of detections along the active corridor (Table 1, to be inserted).

**Figure 3.4 — Per-pixel CUSUM trajectories at the reference points.** Mining pixels accumulate evidence after the training cut-off and cross the threshold; control pixels stay near zero across both windows.

3.5 Spatial CUSUM progression and the choice of threshold

The full-AOI output of the pipeline is illustrated at three representative dates spanning the train/test boundary. The fused score shown is the strict elementwise minimum across the four index-specific CUSUM cubes (§2.6), normalised by the dynamic GP predictive variance (§2.5). Pre-2021 slices stay near zero almost everywhere; post-2021 slices concentrate elevated fused scores along the active mining corridor. Reading the colour scale across the three panels, fused values above ≈5–6 reliably correspond to spatially coherent disturbance, while smaller values mostly populate noisy riparian pixels that fail to accumulate. This empirical separation is what fixes the §2.6 detection threshold of 6 on the fused score.

Strict-AND-fused per-day CUSUM heatmaps across three time slices — **Figure 3.5 — Spatial CUSUM progression, strict-AND fusion, dynamic GP-variance normalised.** Per-day leaky CUSUM (decay = 0.99432/day) fused by elementwise minimum across NDVI, BSI, MNDWI, NDTI, shown at three dates that span the 2021 train/test boundary. Pixels whose fused score exceeds ≈5–6 are spatially coherent and concentrated on the active corridor; below that band the response is dominated by riparian noise. This is the empirical basis for the §2.6 detection threshold of 6.

Appendix — Configuration summary

Training window: all observations before 1 January 2021. Inference window: 1 January 2021 onward.
Covariance kernel: periodic component with period fixed at one year, plus an RBF trend component with lengthscale fixed at five years. Each component has a learnable outputscale.
Mean function: a single learned constant per pixel.
Free parameters per pixel: two outputscales, observation noise, constant mean.
Standardiser: √(σ²_emp + σ²_GP + ε).
CUSUM per-day decay: 0.99432, half-life ≈ 121 days, clamped at zero.
Fusion: elementwise minimum across NDVI, BSI, MNDWI, NDTI accumulator cubes.
Detection threshold on fused score: 6.
Sign convention: NDVI score multiplied by −1; the other three indices use the raw sign.