Gaussian processes — a model that knows where it's ignorant
Most models give you a number. A Gaussian process gives you a number and an honest confidence interval that tightens where you have data and widens where you don't — which turns out to be exactly what you need to decide what to measure next.
A Gaussian process (GP), also called kriging, is a probabilistic regression model that predicts a mean response plus a calibrated uncertainty at every point. The uncertainty band pinches to near-zero at observed data points and balloons in the gaps between them — so the model explicitly represents where it's confident and where it's guessing. A kernel (covariance function) encodes how smoothly the response varies. Because it quantifies uncertainty, the GP is the engine behind active learning and Bayesian optimization, which use it to choose the most informative next experiment.
What makes a GP different
An ordinary fit returns one curve. A Gaussian process returns a distribution over curves consistent with your data, which collapses to a mean prediction and a variance at each point. The crucial property: at a point where you measured, the model is nearly certain (the band pinches); in a gap, the model admits it's interpolating (the band widens). That honesty about ignorance is the whole point — it's information you can act on.
How to read it
1 · The mean is your prediction
The central line is the best estimate of the response across the factor space — a smooth surrogate you can query instead of running an experiment.
2 · The band is calibrated confidence
The ±2σ band is roughly a 95% interval. Narrow band → trust the prediction; wide band → the model is extrapolating and you should be cautious or go measure there.
3 · The kernel sets the behavior
The covariance kernel encodes assumptions — how smooth the response is, how far one point's influence reaches (the length scale), and the noise level. Choosing and fitting the kernel (its hyperparameters) is what adapts the GP to your data.
4 · It drives the next experiment
Because you have both mean and uncertainty, you can pick the next run intelligently — explore where uncertainty is high, or exploit where the mean is promising. That's active learning / Bayesian optimization, and the GP is its engine.
Common pitfalls
- Wrong kernel. Too smooth a kernel washes out real features; too rough overfits noise. The kernel encodes a prior — choose it deliberately.
- Trusting the mean in wide-band regions. A confident-looking mean line in a data gap is the least reliable part of the model; read the band.
- Ignoring noise. If measurements are noisy, the GP must model that noise or the band collapses falsely at data points.
- Scaling. Exact GPs scale poorly with very large datasets — an issue for big data, less so for the modest run counts of physical experiments.
Where this gets slow by hand
Choosing a kernel, fitting its hyperparameters, validating the model, reading off where uncertainty is high, and translating that into the most informative next experiment — then updating as each result lands — is an iterative, specialist workflow. Doing it by hand between every experiment is exactly the loop that slows materials and process optimization down.
A surrogate that recommends the next experiment
Niobia fits the Gaussian process — choosing and tuning the kernel, modeling measurement noise, and validating it — then uses the calibrated uncertainty to recommend the most informative next experiment, balancing exploring high-uncertainty regions against exploiting promising ones. As each result lands, the surrogate updates and the band tightens where it matters. The specialist fit-read-recommend loop runs continuously, so the experimental program converges on the optimum in fewer runs.
Frequently asked
What is a Gaussian process in machine learning?
A Gaussian process (GP) is a probabilistic regression model that defines a distribution over functions. Given data, it predicts a mean response and a calibrated uncertainty at every point, with the uncertainty pinching to near zero at observed points and widening in the gaps. It's also known as kriging in geostatistics.
Why is the uncertainty band useful?
Because it tells you where the model is reliable and where it's guessing. A narrow band means you can trust the prediction; a wide band means the model is extrapolating between data and you should measure there. This is what enables active learning and Bayesian optimization to choose informative experiments.
What is the kernel in a Gaussian process?
The kernel, or covariance function, encodes assumptions about how the response varies — how smooth it is, how far the influence of one observation reaches (the length scale), and the noise level. Choosing and fitting the kernel's hyperparameters adapts the GP to the data.
