Why surrogate ML models for building physics

Building physics simulation is, at its best, a design tool. At its worst, it is a compliance checkbox ticked after the architect has already committed to a facade. The gap between those two outcomes usually comes down to one thing: how long a simulation takes relative to how long a design conversation lasts.

A Radiance daylight run on a single room takes around eight minutes on a capable machine. A TM59 overheating assessment in IES VE takes one to four hours per dwelling. An early-stage design meeting lasts forty-five minutes and spans ten ideas. The maths does not work. Teams either skip the physics entirely at concept stage, or they run one or two scenarios and pretend that is enough. The architectural cost appears later as facade rework, shading retrofits, and missed planning conditions — problems that were entirely knowable earlier, but not at the speed the workflow demanded.

What a surrogate actually is

A surrogate model is a learned approximation of a slower process. You run the slow process (Radiance, IES VE, EnergyPlus, TAS) thousands of times across a representative parameter space, store every input–output pair as a dataset, then train a neural network to reproduce the mapping. At inference time, the network replaces the physics engine: same inputs, same kind of output, a fraction of the runtime.

The network does not understand physics. It has compressed statistical patterns from a corpus of physics runs into weights. That distinction matters enormously for knowing when to trust it.

Three examples from practice

Daylight autonomy (CNN). The daylight surrogate at Hoare Lea was a U-Net CNN trained on 100,000+ synthetic Radiance runs covering UK residential geometries: room dimensions, glazing ratios from 10 to 80 per cent, external shading in three configurations, and four UK climate zones. The network takes a multi-channel image encoding of the room and produces a spatial daylight autonomy map. Accuracy on the held-out test set was ~3% MAE against Radiance, achieved in ~13 seconds of Azure ML GPU inference versus roughly eight minutes for the equivalent Radiance run — roughly a 20× speedup. That gap is large enough to close: architects can explore dozens of options inside a single meeting.

Overheating risk (GNN). Residential overheating for TM59 and Part O compliance poses a harder geometry problem. Dwellings vary in room count, adjacency, and aspect; a CNN that treats spatial maps well loses that topology. A graph neural network handles it naturally — rooms become nodes, shared walls become edges, and the graph varies per dwelling without padding. Trained on 100,000+ parametric TAS runs (same Grasshopper base as the daylight tool, independent parameter sweep), the GNN reaches 97% pass/fail agreement with TAS and returns a prediction in under 10 seconds from Grasshopper. The equivalent TAS run takes one to four hours. What the GNN buys is earlier-stage checks: overheating can enter a concept-stage design review instead of waiting for an hours-long compliance run. (The £100K+/year IES VE licensing figure often quoted alongside this work belongs to a separate effort — the parametric TAS/SAM compliance engine — where retiring that licensing is a projected saving as the workflow matures, not one the GNN realizes on its own.)

Peak solar load (feedforward network). BCO peak solar load for commercial facades is a lower-dimensional problem: floor area, window-to-wall ratio, G-value, U-value, UK location. A simple feedforward network is the right tool — no spatial map to produce, no variable-topology graph to traverse. Trained on 20,000+ EnergyPlus cases (themselves validated against IES VE on live projects), the network hits ~98% accuracy at ~1 minute per prediction, against roughly an hour for the EnergyPlus baseline. That workflow also replaced the IES VE dependency for commercial solar load checks, using an open-source EnergyPlus and OpenStudio stack as the ground truth.

The accuracy–coverage trade-off

What these three tools share is a well-defined training distribution. UK residential geometries. UK climate zones. Rectangular commercial typical floors. Glazing ratios in a specified range. G-values inside current Part L bounds. The accuracy figures — 3% MAE, 97% pass/fail, 98% agreement — are properties of test sets drawn from that same distribution. They say nothing about what happens outside it.

This is the central trade-off of surrogate modelling: you buy speed at the cost of scope. A surrogate that is highly accurate inside its training envelope can fail silently outside it. The CNN does not know it has never seen a curved atrium. The GNN cannot flag that a fifteen-room mixed-use building with a double-height void is outside its residential typology training. The network returns a number either way.

Managing this trade-off in practice means three things. First, parametric sweeps that honestly cover the intended use cases before training starts, not after. Second, clear documentation of what the model was trained on, surfaced in the tool UI, not buried in a notebook. Third, routing genuinely novel cases — unusual geometries, new climate regions, performance targets outside the training range — back to physics.

When not to use a surrogate

The GPU-accelerated Radiance approach I built after the CNN daylight tool makes this concrete. After shipping the CNN, the failure mode became obvious in practice: novel geometries, unusual shading, geometries outside the dataset all degraded prediction quality in ways that were hard to detect without re-running Radiance anyway. If you have to validate the surrogate against physics for a novel case, you have not saved time — you have added a step.

For a tool intended to handle global use cases, arbitrary geometries, and new climate files, a fast physics engine beats a surrogate. The GPU Radiance service (using the Cyclops engine) returns a full typical-floor daylight autonomy map in under 4 seconds at full raytracing accuracy. There is no training distribution to escape, no model drift to manage, no retraining cadence when building regulations change.

The decision between surrogate and accelerated physics is not about which is better in the abstract. It is about whether your use cases are bounded enough that a training corpus can cover them honestly. Residential UK overheating is bounded. Global commercial daylight at arbitrary geometry is not. The tool choice should follow that line.

What the surrogate earns

Where that boundary is honest, the surrogate earns its keep decisively. Sub-10-second overheating results at concept stage mean a sustainability engineer can sit in a design review and test the options on the table rather than reporting back three weeks later. A 13-second daylight map means facade and massing can be iterated before the geometry is locked. A 1-minute peak solar prediction means BCO compliance can enter the conversation at schematic design instead of detailed design.

The physics does not change. The question is whether teams get to use it while decisions are still open.