Oliver M Lean, Luca Rivelli, and Charles H Pence

As our group began to work on a recent project surrounding the use of digital tools and methods in the history and philosophy of biology, we noticed a tension that arises from two trends in recent philosophy of science. First are increasing demands for philosophy of science to hew closer to scientific practice. Gone are the days of rational reconstructions, numerous authors have argued, to be replaced by cautious attention to science as it is actually done and the varied and complex ways it comes to life in institutional, social, and laboratory contexts.

Second, and in no small part powered by the first, as more products of the scientific process—books, journal articles, conference programmes, laboratory notebooks, open datasets—become digitized, the temptation to use these as a rich source of empirical data about that very scientific practice grows. We have, often online and right at our fingertips, access to a variety and quantity of direct information about science that is unimaginable (and, increasingly, unmanageable).

Unsurprisingly this has led to an explosion of work in what we might call digital empirical philosophy of science—using computer-aided analyses to attempt to unpack these resources, search them, process them, and in the end render them the kind of thing that could be useful for philosophers of science interested in drawing generalizations about how science is done. Such analyses have already rendered impressive results, and expanded our knowledge about everything from networks of communication and interaction in early modern Europe to structures of contemporary interdisciplinarity and scientists’ information sharing on social media.

What has not been as extensively explored, however, is just what licences these kinds of inferences about scientific practice, and how exactly they can and should be used to ground work in empirical philosophy of science. As our group has begun to develop and use more of these tools ourselves, this lack of grounding became worrying: How can we be sure that we’re really contributing to the kind of philosophy that we hope to build? What kinds of ‘best practices’ might we envision? Might it even be the case that we’ve moved away from philosophy itself, toward ethnography or anthropology of science (disciplines that are assuredly important, but for which we’re not qualified)? On these more empirical frontiers of the field, what really is philosophy’s distinctive character?

These are no small questions. And they’re made more difficult by the fact that at least some of the issues implicated in resolving them raise interdisciplinary concerns. To zoom in more precisely, consider the nature of digital literature analysis in particular, sometimes called text mining. To see what an analysis of the scientific literature might teach us about scientific practice requires us to unpack the nature of scientific publication, more often the domain of sociologists of science or scholars working in bibliometrics or scientometrics. Why do scientists publish? What roles does the scientific journal literature serve?

One step further removed, we also need to think about how we move from the analysis of terms in the literature itself to questions about the content of science—in essence, the shift from syntax to semantics. This has been, for decades, an animating concern of work in corpus and computational linguistics, a discipline with which philosophers have had unfortunately slight engagement. Moreover, the analysis of scientific literature—as opposed to corpora drawn from everyday speech, the media, and so on—is itself not a particularly common topic in linguistics. Even when we build these connections with other fields, that is, we may not find clear answers to our questions.

Our article is thus an outline of what this broad justificatory project might look like. We begin by considering the question of the relationship between the scientific literature and scientific practice. In particular, we focus on the challenge that has been posed to any analysis focused on scientific literature by practice-focused philosophy of science. Many scholars have been attracted to considering the philosophy of science-in-practice precisely because it is viewed as an alternative to our being bound by scientists’ own accounts in their journal articles, which may be distorted by a variety of personal, social, prestige, and publication pressures. If this new turn has improved philosophy of science by moving beyond its tight connections to the record as laid down in the literature, why would we want to return the focus to precisely those kinds of analyses?

In short, our response to this first challenge is that the naïve reading of this disconnect between the literature and practice can be defused if we consider what that ‘gap’ between literature and practice actually consists in. Of course, there are manifold problems with asserting that the journal literature simply is an accurate mirror of scientists’ actual beliefs and practices. Even scientists themselves readily admit that this picture would be too idealized. But there are a variety of different, competing ways to fill in the story about just what relationship the literature does in fact hold to scientific practice. There must be one—after all, scientists still hold journal clubs and read articles as a central part of laboratory life. We canvas a number of different options, arguing that on each, while the story about this relationship will certainly have to be sophisticated, there will remain ways in which the literature is communicating important evidence about science that philosophers need to take into account.

What about the latter connection? How can we move from analyses of the scientific literature (say, of the frequencies or co-occurrences of terms in a body of text, or the networks of citation or collaboration among authors in a journal over time) to generalizations about the content of science itself? Here, as well, there are differing answers in the literature on computational linguistics. Some place the burden on social factors and the community negotiation of the meanings of key terms, while others treat bodies of text as tools with which to create and test hypotheses about the relationship between language and the world. Again, no one doubts that broad-scale information about the uses of terms in the scientific literature teaches us something about the real process of doing science. But the story will need to be worked out in some detail, and the different ways that one might tell it will lead to different kinds of justification for different kinds of inferences about the scientific process.

What we need to justify the use of these kinds of methods in some detail, then, is to consider what we call ‘packages’ of answers to these questions. An understanding of how the literature relates to scientific practice will constrain the kind of information that the literature could possibly contain. And an understanding of how we can draw generalizations from that information will, in turn, constrain the scope of the possible philosophical generalizations that might result from it. Examining these sets of questions together can therefore give us ways to approach exactly the question with which we began: how can digital analysis of the products of science help inform an empirical philosophy of science?

Of course, there is much more to do. We only very briefly evaluate a small collection of possible answers to these questions at the end of the article. For instance, one might take the view that scientific articles contribute to a shared narrative, with digital analyses contributing an analysis of the entrenchment of new concepts within that narrative. The co-authors disagree among ourselves about which of these packages is the right way to think about what these tools can provide, and more systematic evaluation and engagement with these questions is undoubtedly needed. But we hope that having laid out some of the questions here will encourage others to join us in the effort.

Listen to the audio essay

Subscribe to podcast


Lean, O. M., Rivelli, L. and Pence, C. H. [2023]: ‘Digital Literature Analysis for Empirical Philosophy of Science’, British Journal for the Philosophy of Science, 74, doi: 10.1086/715049

Oliver M. Lean
Université catholique de Louvain

Luca Rivelli
Université catholique de Louvain

Charles H. Pence
Université catholique de Louvain

© The Author (2021)


Lean, O. M., Rivelli, L. and Pence, C. H. [2023]: ‘Digital Literature Analysis for Empirical Philosophy of Science’, British Journal for the Philosophy of Science, 74, doi: 10.1086/715049