Tag Archives: machine learning

Ancient models and stochastic parrots

Decorative image of a parrot in the parkIn 2021 Emily Bender and her colleagues published a paper suggesting that the Large Language Models (LLMs) underpinning many Artificial Intelligence applications (AI apps) were little more than stochastic parrots.  They described LLMs as ‘a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning’.  This has fuelled the ongoing debate about the real capabilities of AI apps versus the hype from the companies trying persuade us to use them.  Most AI apps are based on statistical analysis of data as stated by Bender et al; however, there is a trend toward physics-based machine learning in which known laws of physics are combined with machine-learning algorithms trained on data sets [see for example the recent review by Meng et al, 2025].  We have been fitting data to models for a very long time.  In the fifth century BC, the Babylonians made perhaps one of the greatest breakthroughs in the history of science, when they realized that mathematical models of astronomical motion could be used to extrapolate data and make predictions.  They had been recording astronomical observations since 3400 BC and the data was all collated in cuneiform in the library at Nineveh belonging to King Ashurbanipal who ruled from 669-631 BC.  While our modern-day digital storage capacity in data centres might far exceed the clay tablets with cuneiform symbols found in Ashurbanipal’s library, it seems unlikely that they will survive five thousand years as part of the data from the Babylonians’ astronomical observations has done and still be readable.

References:

Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021, March. On the dangers of stochastic parrots: Can language models be too big?🦜. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Meng C, Griesemer S, Cao D, Seo S, Liu Y. 2025. When physics meets machine learning: A survey of physics-informed machine learning. Machine Learning for Computational Science and Engineering. 1(1):20.

Wisnom, Selena, The library of ancient wisdom.  Penguin Books, 2025.

Image: Parrot in the park – free stock photo by Pixabay on Stockvault.net

Machine learning weather forecasts and black swan events

Decorative painting of a stormy seascapeA couple of weeks ago I read about Google’s new weather forecasting algorithm, GraphCast.  It takes a radical new approach to forecasting by using machine learning rather than modelling the weather using the laws of physics [see ‘Storm in a computer‘ on November 16th, 2022].  GraphCast uses a graph neural network that has been trained on 39 years (1979 -2017) of historical data from the European Centre for Medium-Range Weather Forecasts (ECMWF). It requires two inputs: the current state of the weather and the state six hours ago; then it predicts the weather six hours ahead with a 0.25 degree latitude-longitude resolution (about 17 miles) at 38 vertical levels.  This compares to ECMWF’s high resolution forecasts which have 0.1 degree resolution (about 7 miles), 137 levels and 1 hour timesteps.  Although the training of the neural network took about four weeks on 32 Cloud TPU v4 devices (Tensor Processing Units), the forecast requires less than a minute on a single device whereas the ECMWF’s high resolution forecast requires a couple of hours on a supercomputer.  Within a day or so of reading about GraphCast, we watched ‘The Day After Tomorrow’, a movie in which a superstorm suddenly plunges the entire northern hemisphere into an ice age with dramatic consequences.  Part of the movie’s message is that humanity’s disregard for the state of the planet could lead to existential consequences.  It occurred to me that the traditional approach to weather forecasting using the laws of physics might predict the onset of such a superstorm and avoid it becoming a black swan event; however, it is very unlikely forecasts based on machine learning would predict it because there is nothing like it in the historical record used to train the neural network.  So for the moment we should continue to use the laws of physics to model and predict the weather since climate change appears to be making superstorms more likely [see ‘More violent storms‘ on March 1st 2017].

Sources:

Blum A, The weather forecast may show AI storms ahead, FT Weekend, 18/19 November 2023.

Lam R, Sanchez-Gonzalez A, Willson M, Wirnsberger P, Fortunato M, Alet F, Ravuri S, Ewalds T, Eaton-Rosen Z, Hu W, Merose A. Learning skillful medium-range global weather forecasting. Science. 10.1126/science.adi2336, 2023.

Image: Painting by Sarah Evans owned by the author.

 

Opportunities lost in knowledge management using digital technology

Decorative imageRegular readers of this blog will know that I occasionally feature publications from my research group.  The most recent was ‘Predicting release rates of hydrogen from stainless steel’ on September 13th, 2023 and before that ‘Label-free real-tracking of individual bacterium’ on January 25th 2023 and ‘A thermal emissions-based real-time monitoring system for in situ detection of cracks’ in ‘Seeing small changes is a big achievement’ on October 26th 2023.  The subject of these publications might seem a long way apart but they are linked by my interest in trying to measure events in the real-world and use the data to develop and validate high-fidelity digital models.  Recently, I have stretched my research interests still further through supervising a clutch of PhD students with a relatively new collaborator working in the social sciences.  Two of the students have had their first papers published by the ASME (American Society of Mechanical Engineers) and the IEEE (Institute of Electrical and Electronics Engineers).  Their papers are not directly connected but they both explore the use of published information to gain new insights on a topic.  In the first one [1], we have explored the similarities and differences between safety cases for three nuclear reactors: a pair of research reactors – one fission and one fusion reactor; and a commercial fission reactor.  We have developed a graphical representation of the safety features in the reactors and their relationships to the fundamental safety principles set out by the nuclear regulators. This has allowed us to gain a better understanding of the hazard profiles of fission and fusion reactors that could be used to create the safety case for a commercial fusion reactor.  Fundamentally, this paper is about exploiting existing knowledge and looking at it in a new way to gain fresh insights, which we did manually rather than automating the process using digital technology.  In the second paper [2], we have explored the extent to which digital technologies are being used to create, collate and curate knowledge during and beyond the life-cycle of an engineering product.  We found that these processes were happening but generally not in a holistic manner.  Consequently, opportunities were being lost through not deploying digital technology in knowledge management to undertake multiple roles simultaneously, e.g., acting as repositories, transactive memory systems (group-level knowledge sharing), communication spaces, boundary objects (contact points between multiple disciplines, systems or worlds) and non-human actors.  There are significant challenges, as well as competitive advantages and organisational value to be gained, in deploying digital technology in holistic approaches to knowledge management.  However, despite the rapid advances in machine learning and artificial intelligence [see ‘Update on position of AI on hype curve: it cannot dream’ on July 26th 2023] that will certainly accelerate and enhance knowledge management in a digital environment, a human is still required to realise the value of the knowledge and use it creatively.

References

  1. Nguyen, T., Patterson, E.A., Taylor, R.J., Tseng, Y.S. and Waldon, C., 2023. Comparative maps of safety features for fission and fusion reactors. Journal of Nuclear Engineering and Radiation Science, pp.1-24
  2. Yao, Y., Patterson, E.A. and Taylor, R.J., 2023. The Influence of Digital Technologies on Knowledge Management in Engineering: A Systematic Literature Review. IEEE Transactions on Knowledge and Data Engineering.

Where is AI on the hype curve?

I suspect that artificial intelligence is somewhere near the top of the ‘Hype Curve’ [see ‘Hype cycle’ on September 23rd, 2015].  At the beginning of the year, I read Max Tegmark’s book, ‘Life 3.0 – being a human in the age of artificial intelligence’ in which he discusses the prospects for artificial general intelligence and its likely impact on life for humans.  Artificial intelligence means non-biological intelligence and artificial general intelligence is the ability to accomplish any cognitive task at least as well as humans.  Predictions vary about when we might develop artificial general intelligence but developments in machine learning and robotics have energised people in both science and the arts.  Machine learning consists of algorithms that use training data to build a mathematical model and make predictions or decisions without being explicitly programmed for the task.  Three of the books that I read while on vacation last month featured or discussed artificial intelligence which stimulated my opening remark about its position on the hype curve.  Jeanette Winterson in her novel, ‘Frankissstein‘ foresees a world in which humanoid robots can be bought by mail order; while Ian McEwan in his novel, ‘Machines Like Me‘, goes back to the early 1980s and describes a world in which robots with a level of consciousness close to or equal to humans are just being introduced to the market the place.  However, John Kay and Mervyn King in their recently published book, ‘Radical Uncertainty – decision-making beyond numbers‘, suggest that artificial intelligence will only ever enhance rather replace human intelligence because it will not be able to handle non-stationary ill-defined problems, i.e. problems for which there no objectively correct solution and that change with time.  I think I am with Kay & King and that we will shortly slide down into the trough of the hype curve before we start to see the true potential of artificial general intelligence implemented in robots.

The picture shows our holiday bookshelf.