Being able to analyze data properly has always been important, even before we went into the digital and big data era.
Data analysis is a very important skillset for scientists, because models are built on the results that we see in experiments, and if we are able to properly analyze our experimental data, we are able to formulate models that better represent reality.
But data analysis is only half the story.
The other half, the one that is most often forgotten, is that you also need to be able to:
Communicate your findings to others
Convince others that what you’ve found is indeed correct
This is also known as..... Storytelling.
This forgotten aspect of data can sometimes be so powerful that occasionally, people are able to convince us of "findings" that aren't even true, like, at all - not even a bit. Then, it is then up to the community to use their ability to analyze the data, and to find any flaws in the original data analysis.
In this sense, analysis and storytelling should go hand-in-hand, one supporting the other, and together should lead to convincing and well-developed arguments.
However, sometimes, this can also be an ebb-and-flow system, where:
over-analysis messes up the story, or
the story is too far fetch and the analysis doesn’t support this kind of detail.
In these cases, it is important that the analytics and the story balance and complement each other.
>> The art of storytelling is therefore very important to understand, and it’s a skill that’s very important to master alongside analytical ability, as they are both required to deliver a great analytical report and become a senior and more advanced (aka the ULTIMATE) data scientist.
So, Max, how do I learn this Voodoo magic that is storytelling?
Truth be told? It’s actually not as hard, or mysterious, as it sounds.
It's all about presenting your data.
And really, presenting your data in a convincing way is very similar to building a big, delicious chocolate cake.
Do you start baking before you have all your ingredients? No.
Do you start building your cake from the top layer? No.
Do you ice your cake before the cake has been assembled? No.
Do you eat the cake before it's finished? Yes. What? I mean, No.
There is a system. You start from the bottom layer, and you build upwards until you reach the top of the cake. One delicious step feeds into the next, and everything is well supported.
This is exactly how your storytelling should play out - one layer after the next, with your final conclusions fully supported by all the layers below.
>> But if you came here for a baking lesson, I'm sorry to disappoint. I’m not a baker, and this will not be an article telling you how to structure a cake, but rather how to scientifically structure a data analysis to convince people of your finding. <<
Step 1: State your assumptions
Every model has its assumptions, although they may not always be that obvious to find.
This often depends on where you data comes from. if you’re doing a statistical analysis of a survey response, your assumption is probably that you’ve got a representative sample (which is something you should think critically about, because analyzing a biased sample with the thought of it being representative of the population can lead you to false conclusions).
For scientific models, you assume that the base theory holds; for example, when looking at the orbits of planets around the sun, you assume Newton’s laws of gravity are true (we don’t need any of Einstein’s fancy stuff here - Newton’s equations will do just as well).
Example assumption: I assume that the sky is blue.
Optional Step 1b: Present the model you’re testing and state your Hypothesis
This may not always be applicable, because in some forms of analytics, you don’t have a developed model yet; rather, you’re testing the waters to see what you find, which can hopefully provide you with a basis structure that you can use to create a model.
However, if you have created a model, you should introduce it and explain it here, as well as state the hypothesis of what you expect to find/what you’re testing your model against.
Step 2: Lay the groundwork
Here, it is very important that you lay the groundwork by telling everyone a) what you’re analyzing, b) what your variables mean, and c) why you’re analyzing this.
The reason “I thought this might be interesting to investigate” is perfectly fine, exploratory work is, by definition, exploratory.
Not everything has to be predicted by a model, in fact, usually models are build on results seen from experiment, as that’s ultimately the final test to tell us what’s actually going on.
Step 3: Explain your findings for the first part of your analysis
Not only do you want to show people the data, but you also want to explain a) what analyses you’ve done on the data, b) why you did it, and c) what your findings were.
One thing to keep in mind is the importance of good data visualization. A great, self-consistent, and clear graph can save you a lot time that you would otherwise use for explanation, in fact, one theory predicts that such a picture can save you 1000 words.
If you're interested in learning how to make awesome data visualizations, you can check out my complete course on that here.
Along with stating your findings, you also want to answer the question “what do these findings mean/imply”, “do I trust my results/do they make sense”, and “how can I construct further tests to provide better backing of my results”.
Sometimes you need several different forms of analytics to say “the evidence suggests this”. In this case, you can state here something like “what we’re starting to see here is this…”
Step 4: Present and explain your second piece of analytics
You should approach this just like you did with your first piece, but ideally, your second piece of analytics should be related to your first piece, probably continuing on your overall analysis.
Continuing building the layers of your cake...
This can be through approaching it from a different angle, or taking it one step further (either going out and looking at a broader source, or zooming in and looking at specific cases, each can provide us with potentially important information).
You can even make reference to your findings from your first piece of analytics, or you can save it till the end, where you link all your findings together.
Additional Steps: If you have more analyses to present, continue on like we did in steps 3 - 4. Basically here you just continue on with how we’ve been presenting our analytics up till now, until you’ve shown everything you deemed important.
Step 5: Summarize your findings
Now that everyone has not only seen your analytics, but hopefully also understood it, you should repeat what you’ve analyzed, and summarize your key findings from each piece of analytics.
This is really all about cutting out the smaller details, and drawing the big conclusions out. Only the most pertinent of statements should stay here.
Step 6: Link your findings
With the summary, everyone is again made aware of all the individual findings you’ve made, and you can now use this to link them all together, giving a complete overview of your findings as well as bringing them together for context.
These are the general steps you should use to create a good and understandable story. Your story should ideally fill the room with a sense of enlightenment... but to start with, you can just aim to see less confused faces in the room staring back at you.
There are still many fine-grained aspects that are contained within each step. I’ve tried to provide examples for different scenarios you might encounter, but sometimes you may stumble into new territory.
One important thing to note from this article is that being able to visualize data well is a very important overall skill for storytelling, as there aren’t many people that get a lot of insight, let alone joy, looking at a huge excel sheet.
Plus, what's a good story without some nice pictures?