From Chocolate to Nobel Prize: Difference Between Correlation and Causation

Nov 18, 2024

When it comes to making decisions based on data, one of the most common pitfalls is confusing correlation with causation. It’s tempting to believe that when two things are statistically linked, one must be causing the other. But let’s take a step back and examine this through an amusing example involving chocolate and Nobel laureates before looking at this Correlation/Causation question with an industrial mindset.

A - The Chocolate-Nobel Laureate Correlation

Dietary flavonoids, abundant in plant-based foods, have been shown to improve cognitive function. Specifically, a reduction in the risk of dementia, enhanced performance on some cognitive tests, and improved cognitive function in elderly patients with mild impairment have been associated with a regular intake of flavonoids.

A subclass of flavonoids called flavanols, which are widely present in cocoa, green tea, red wine, and some fruits, seems to be effective in slowing down or even reversing the reductions in cognitive performance that occur with aging. Since chocolate consumption could hypothetically improve cognitive function not only in individuals but also in whole populations, it was worth investigating whether there would be a correlation between a country’s level of chocolate consumption and its population’s cognitive function.

Out of this question comes an intriguing graph that shows the number of Nobel laureates per 10 million people in various countries and their per capita chocolate consumption. Unsurprisingly, Switzerland—a country renowned for its chocolate—is ahead of the pack, while the USA, despite its many laureates, ranks low in terms of chocolate intake. Looking at this, you might be tempted to conclude that consuming more chocolate directly increases a country's likelihood of winning Nobel prizes.

Graph showing correlation between chocolate consumption and Nobel price winners — *Data from the Messerli paper*

But is there really a causal link between indulging in sweets and achieving groundbreaking scientific breakthroughs? Of course not. What we’re seeing here is a correlation—a relationship between two variables—but one that doesn’t imply causation. The connection could be influenced by many unseen factors, such as a country’s economic development, educational systems, or even sheer coincidence. And that’s where we must tread carefully when interpreting data.

B - From Chocolate to Milk and Process Industry

Machine Learning & the Dairy Industry

Now, if we make the small leap from chocolate to milk chocolate, we land right in the heart of the dairy industry, where complex manufacturing processes take place. And this is where things get interesting—because understanding the difference between correlation and causation can have a direct impact on production outcomes.

At Intelecy, we had the opportunity to work with a manufacturer to optimize their production throughput; quite interesting these days given that most manufacturers are more interested in producing a partial load rather than at nameplate capacity. The initial data revealed some correlations between certain operating parameters and output levels. But instead of just settling for those correlations, the customer dug deeper into the data with help of ML-driven tools. At the largest dairy in Norway, Data Analysis showed that about 43% of cheese batches (a flagship cheese and a staple in many Norwegian homes known for its mild and slightly sweet taste) had too high (red area) dry matter, while the blue area showed the ideal range.

Graph from Intelecy - Initial dry matter index distribution — *Initial dry matter index distribution*

What the customer uncovered was a clear causal relationship: by tweaking just two specific parameters, the manufacturer could immediately increase its yield, driving up production efficiency in real-time, indeed, out of the 40 input parameters potentially involved in the deviations (Process parameters, Seasonal variability, Machine settings…), three could explain 30% of the variance and one of them, namely the buffer tank hold-up time, could single-handedly help bring the production dry matter KPI from 40% correct to more than 85% by simply keeping the holding time under 53 min for each and every batch. Furthermore, simple slight modification of actionable process parameters by the operators such as filling time help boost that number to a further 95%.

The difference between thinking you know and knowing you know.

*Relation Tree Map providing recommendation as to which parameter to modify / Blue side gives the ones improving the selected KPI*

Machine Learning & the Process Industry

The data-driven approach outlined above is not confined to a specific industry or sector. Instead, it represents a versatile framework using Intelecy that can be applied to a wide range of use cases, such as improving cooling tower efficiency, optimizing water consumption, balancing steam grids, reducing VOC emissions, refining refrigeration systems, enhancing yield, selectivity, and conversion rates in chemical reactions, or extending catalyst lifetimes.

For example, if you’re involved in formaldehyde production using silver catalyst technology on flat-bed reactors, with typical production cycles of 7 to 9 months before a catalyst change, do you know which parameters contribute most to your key performance indicators (KPIs)? And more importantly, do you understand how each of these factors individually affects the lifespan of your catalyst?

Is it the final product concentration and its slight variations between 49% and 52%? The methanol content, which gently fluctuates between 0.8% and 1.2%? The bleeding flow rate? The gas recycling bleeding ratio? The gas residence time? Or perhaps the surface area of packings in the absorption tower?

How can you adjust these parameters to ensure peak performance every single day?

*Chemical Site – APAC - MeOH Downstream and Intermediates*

This is the type of question which gets answered every day by customers not because they think they know but rather because they know they know.

C - Beyond Correlation

The chocolate and Nobel Prize example is a lighthearted reminder of why it's essential to think critically when interpreting data. While correlations can be intriguing, they often don’t tell the whole story. By using machine learning to uncover true causative relationships, companies can make better-informed decisions that lead to tangible results.

So, the next time you enjoy a piece of chocolate while contemplating your fouling heat exchangers, unstable mashing processes, shifting centerlines, imprecise filling line #2, slow batch release compounding lines, complex distillation columns, water-sensitive calcination units, unpredictable temperature profiles in dryers, or CO2-emitting LNG boilers, you can smile at the idea that maybe, just maybe, that chocolate might make you smarter. But remember, in the end, it’s vital to distinguish between coincidence and causation. And if you’re interested in diving deeper into this topic (and your data), feel free to reach out—perhaps while calculating just how much chocolate you’d need to boost your odds of a trip to Stockholm! 😊

At Intelecy we're committed to supporting manufacturing industry clients to optimize their production with AI. Learn more about our No-code Industrial AI solution and get in touch to learn more or book a demo.

Tags: Blog Data and AI