Tag Archives: decomposition

More on fairy lights and volume decomposition (with ice cream included)

Explanation in textLast June, I wrote about representing five-dimensional data using a three-dimensional stack of transparent cubes containing fairy lights whose brightness varied with time and also using feature vectors in which the data are compressed into a relatively short string of numbers [see ‘Fairy lights and decomposing multi-dimensional datasets’ on June 14th, 2023].  After many iterations, we have finally had an article published describing our method of orthogonally decomposing multi-dimensional data arrays using Chebyshev polynomials.  In this context, orthogonal means that components of the resultant feature vector are statistically independent of one another.  The decomposition process consists of fitting a particular form of polynomials, or equations, to the data by varying the coefficients in the polynomials.  The values of the coefficients become the components of the feature vector.  This is what we do when we fit a straight line of the form y=mx+c to set of values of x and y and the coefficients are m and c which can be used to compare data from different sources, instead of the datasets themselves.  For example, x and y might be the daily sales of ice cream and the daily average temperature with different datasets relating to different locations.  Of course, it is much harder for data that is non-linear and varying with w, x, y and z, such as the intensity of light in the stack of transparent cubes with fairy lights inside.  In our article, we did not use fairy lights or icecream sales, instead we compared the measurements and predictions in two case studies: the internal stresses in a simple composite specimen and the time-varying surface displacements of a vibrating panel.

The image shows the normalised out-of-plane displacements as the colour as a function of time in the z-direction for the surface of a panel represented by the xy-plane.

Source:

Amjad KH, Christian WJ, Dvurecenska KS, Mollenhauer D, Przybyla CP, Patterson EA. Quantitative Comparisons of Volumetric Datasets from Experiments and Computational Models. IEEE Access. 11: 123401-123417, 2023.

Fairy lights and decomposing multi-dimensional datasets

A time-lapsed series of photographs showing the sun during the day at North Cape in NorwayMany years ago, I had a poster that I bought when I visited North Cape in Norway where in summer the sun never sets.  The poster was a time-series of 24 photographs taken at hourly intervals showing the height of the sun in the sky during a summer day at North Cape, similar to the thumbnail.  We can plot the height of the sun as a function of time of day with time on the horizontal axis and height on the vertical axis to obtain a graph that would be a sine wave, part of which is apparent in the thumbnail.  However, the brightness of the sun also appears to vary during the day and so we could also conceive of a graph where the intensity of a line of symbols represented the height of the sun in the sky.  Like a string of fairy lights in which we can control the brightness of each one individually  – we would have a one-dimensional plot instead of a two-dimensional one.  If we had a flat surface covered with an array of lights – a chessboard with a fairy light in each square – then we could represent three-dimensional data, for instance the distribution of elevation over a field using the intensity of the lights – just as some maps use the intensity of a colour to illustrate elevation.  We can take this concept a couple of stages further to plot four-dimensional data in three-dimensional space, for instance, we could build a three-dimensional stack of transparent cubes each containing a fairy light to plot the variation in moisture content in the soil at depths beneath as well as across the field.  The location of the fairy lights would correspond to the location beneath the ground and their intensity the moisture content.  I chose this example because we recently used data on soil moisture in a river basin in China in our research (see ‘From strain measurements to assessing El Nino events’ on March 17th 2021).  We can carry on adding variables and, for example if the data were available, consider the change in moisture content with time and three-dimensional location beneath the ground – that’s five-dimensional data.  We could change the intensity of the fairy lights with time to show the variation of moisture content with time.  My brain struggles to conceive how to represent six-dimensional data though mathematically it is simple to continue adding dimensions.  It is also challenging to compare datasets with so many variables or dimensions so part of our research has been focussed on elegant methods of making comparisons.  We have been able to reduce maps of data – the chessboard of fairy lights – to a feature vector (a short string of numbers) for some time now [see ‘Recognizing strain’ on October 28th, 2015 and ‘Nudging discoveries along the innovation path’ on October 19th, 2022]; however, very recently we have extended this capability to volumes of data – the stack of transparent cubes with fairy lights in them.  The feature vector is slightly longer but can be used track changes in condition, for instance, in a composite component using computer tomography (CT) data or to validate simulations of stress or possibly fluid flow [see ‘Reliable predictions of non-Newtonian flows of sludge’ on March 29th, 2023].  There is no reason why we cannot extend it further to six or more dimensional data but it is challenging to find an engineering application, at least at the moment.

Photo by PCmarja2006 on Flickr

Million to one

‘All models are wrong, but some are useful’ is a quote, usually attributed to George Box, that is often cited in the context of computer models and simulations.  Working out which models are useful can be difficult and it is essential to get it right when a model is to be used to design an aircraft, support the safety case for a nuclear power station or inform regulatory risk assessment on a new chemical.  One way to identify a useful model to assess its predictions against measurements made in the real-world [see ‘Model validation’ on September 18th, 2012].  Many people have worked on validation metrics that allow predicted and measured signals to be compared; and, some result in a statement of the probability that the predicted and measured signal belong to the same population.  This works well if the predictions and measurements are, for example, the temperature measured at a single weather station over a period of time; however, these validation metrics cannot handle fields of data, for instance the map of temperature, measured with an infrared camera, in a power station during start-up.  We have been working on resolving this issue and we have recently published a paper on ‘A probabilistic metric for the validation of computational models’.  We reduce the dimensionality of a field of data, represented by values in a matrix, to a vector using orthogonal decomposition [see ‘Recognizing strain’ on October 28th, 2015].  The data field could be a map of temperature, the strain field in an aircraft wing or the topology of a landscape – it does not matter.  The decomposition is performed separately and identically on the predicted and measured data fields to create to two vectors – one each for the predictions and measurements.  We look at the differences in these two vectors and compare them against the uncertainty in the measurements to arrive at a probability that the predictions belong to the same population as the measurements.  There are subtleties in the process that I have omitted but essentially, we can take two data fields composed of millions of values and arrive at a single number to describe the usefulness of the model’s predictions.

Our paper was published by the Royal Society with a press release but in the same week as the proposed Brexit agreement and so I would like to think that it was ignored due to the overwhelming interest in the political storm around Brexit rather than its esoteric nature.

Source:

Dvurecenska K, Graham S, Patelli E & Patterson EA, A probabilistic metric for the validation of computational models, Royal Society Open Science, 5:1180687, 2018.