Florian J. Boge

Experimental high-energy physics, the study of material particles colliding at tremendous energies, is ripe with simulations and simulated data. This fact has been brought to the fore most notably by the late Margaret Morrison ([2015], p. 314), based on discussions with physicists who are now members of our research unit, The Epistemology of the Large Hadron Collider.

Experimental high-energy physics is certainly not the only field of empirical research in which computer simulations abound: climate science and astronomy readily come to mind as further examples. Yet we all know that simulations rely on models, and models rely on simplifying and idealizing assumptions. Worse again, the models underlying computer simulations are often especially contrived, not least to compensate for artefacts introduced by the implementation on a physical computer. And on top of all this, most simulation models contain various free parameters that need to be fitted to data.

With all these assumptions, artefacts, and parameters around, how could an experimenter possibly trust her experimental results? I believe that the key to this lies in robustness analysis, but one must be cautious as to what such an analysis can establish about the models used.

Robustness analysis is the process of varying the means by which a theoretical or experimental result is detected, and then of observing whether it remains invariant. It has a long history in philosophy of science, and has recently been elegantly analysed by Jonah Schupbach ([2016]) as a kind of explanatory reasoning: by successively ruling out that the result depends on a specific means, we successfully eliminate various hypotheses that would have it be an artefact of a given detection method (model or experimental setup). In this way, we incrementally ensure that the result is best explained as a feature of the system under study.

Kent Staley ([2020]) has, in part, already shown that high-energy physicists establish trust in this way. As he argues—correctly, according to textual evidence and private communication with practitioners—in varying the modelling assumptions underlying an experimental inference, we establish a measure for the result’s uncertainty on the basis of the variability that we detect. And a sensible uncertainty measure is key to establishing a trustworthy result, as it offers a weaker version of the result that is independent of the respective modelling assumptions.

High-energy physicists, for instance, have a range of models for hadronization, the process in which quarks and gluons form bound states (‘hadrons’). Some models treat the colour field that binds quarks as a flexible, tearable string; others neglect it almost completely and focus on the fact that quarks should end up ‘clustered’ into meta-stable colourless states; yet others pursue further treatments. The spread of results generated under these different models is then used to pragmatically define a ‘hadronization uncertainty’. And since the modelling assumptions differ from one another, this ensures that the result is not strongly dependent on any of these assumptions.

Hadronization is but one of many parts to the complex events going on at colliders such as the Large Hadron Collider, and many further models, with assumptions and parameters of their own, must be considered. In my article, I have called the successful outcome of varying models in this way ‘conceptual robustness’.

But varying assumptions is not all that high-energy physicists do. They also vary the processes and detector effects, and even how simulated data enter into analysis. For instance, in the template method, a range of input values for a quantity of interest is used to generate histograms, and these are used as ‘templates’ for what actual histograms should look like if the quantity takes on a given value. In the matrix element method, one instead calculates from the theory as far as possible, and predicts the states to be measured in the detector from a parametric ‘response function’. The parameters of that function are, however, usually fixed using simulated data, so computer simulations still enter indirectly. And they also enter negatively, in defining the distribution of ‘background’ events.

Because these different ways of using simulated data discriminate against several distinct hypotheses as to the artefactual nature of the result, success in this case certainly secures a stronger basis for trusting simulation-infected experimental results—something I have called ‘methodological robustness’. However, so far, the role of free parameters has not been addressed.

One may wonder why free parameters should be problematic at all. The standard model of high-energy physics, for instance, counts as science’s best confirmed theory to date, and has twenty-seven free parameters. Whether parameters are so harmless depends, for the most part, on two things: whether they are physical quantities and whether they have experimentally robust values.

It appears that in the standard model, free parameters are physical quantities that have experimentally robust values; but this is not true of the simulation models used to test the standard model. Each of the Large Hadron Collider collaborations, such as ATLAS or CMS, has a whole range of ‘tunes’ for the same model, which are used for specific purposes. Furthermore, in a remarkable study, Corcella et al. ([2018]) have suggested that measurement accuracy can be increased by tuning in situ—that is, on the very data to be analysed. Clearly, this must be done in a way that evades circularity, and this requires the identification of dedicated ‘calibration observables’, describing the relevant process without being strongly dependent on the quantity to be measured.

Such a procedure may increase accuracy, but it certainly also marks the failure of free parameters with experimentally robust values—or ‘inverse parametric robustness’, as I have called it in this context, since it goes from experimental data to the simulation models used in experimental analysis. The existence of different tunes furthermore shows that the values of these parameters generally need to be varied with the context of application if one wants to use the models successfully.

All of this certainly seems strange if one thinks of these models as faithfully representing aspects of real scattering events, and their parameters as representing physical magnitudes. But it is perfectly comprehensible if one thinks of the models as cognitive instruments: theoretical entities that facilitate inferences without being faithful representations of an underlying reality.

This is best seen when the tuning of a model is compared to Eran Tal’s ([2012]) account of calibration: Herein, a certain function needs to be determined that represents the connection between the properties of the instrument and its readout in measurements, relative to a standard indication. Since some instruments have flexible properties, such as the adjustable links on a pressure gauge, adjusting the parameters of a simulation model may be seen as tweaking the flexible properties of an instrument.

Now when a simulation model has been calibrated in this way, either in situ or to some relevantly similar experimental data, the resulting tune may be used together with other tunes to ensure that the result does not strongly depend on any particular tune—something I have called ‘parametric robustness’. But if such procedures regularly lead to consistent results, while being safe from circularity, one may also think that this endows the models with a sense of reliability, at least in their function as cognitive instruments.

All of this gives us a feeling for why it might be epistemically acceptable that physicists use models in these ways; and it may also secure our trust in the experimental results of high-energy physics, in the sense that they will probably not be completely overthrown by the next, bigger collider, say. But it also forcefully reminds us that we should be careful not to hastily attribute ‘truth’ to the posits of a theoretical entity just because it is successfully employed in furthering empirical knowledge.

Listen to the audio essay


Boge, F. J. [2024]: ‘Why Trust a Simulation? Models, Parameters, and Robustness in Simulation-Infected Experiments’, British Journal for the Philosophy of Science, 75, doi: 10.1086/716542.

Florian J. Boge
University of Wuppertal


Corcella, G., Franceschini, R. and Kim, D. [2018]: ‘Fragmentation Uncertainties in Hadronic Observables for Top-Quark Mass Measurements’, Nuclear Physics B, 929, pp. 485–526.

Morrison, M. [2015]: Reconstructing Reality: Models, Mathematics, and Simulations, Oxford: Oxford University Press.

Schupbach, J. N. [2016]: ‘Robustness Analysis as Explanatory Reasoning’, British Journal for the Philosophy of Science, 69, pp. 275–300.

Staley, K. W. [2020]: ‘Securing the Empirical Value of Measurement Results’, British Journal for the Philosophy of Science, 71, pp. 87–113.

Tal, E. [2012]: ‘The Epistemology of Measurement: A Model-Based Account’, Ph.D. Thesis, University of Toronto.

© The Author (2021)


Boge, F. J. [2024]: ‘Why Trust a Simulation? Models, Parameters, and Robustness in Simulation-Infected Experiments’, British Journal for the Philosophy of Science, 75, doi: 10.1086/716542.