QueenOfData

Simpson’s Paradox: What’s So Paradox About The Simpsons?

One of Tableau’s main strengths lies in aggregation. The user holds control over the level of detail at which they want to view the data. But data sets such as Anscombe’s Quartet teach us that a data insight at an aggregate level will not always hold under a more detailed point of view. The same goes for Simpson’s paradox.

What is Simpson’s paradox?

Simpson’s paradox is something you may or may never have heard of, but you should definitely know about it. Also known as the Yule-Simpson effect or reversal paradox, it describes the phenomenon of a trend appearing in a high-level aggregation of data, but disappearing or reversing in subgroups of the same data, or vice versa.

Let’s take a look at one of two famous examples: the success of two different procedures for the removal of kidney stones. When looking at the different sizes of stones, we can see that the easier procedure, percutaneous nephrolithotomy, is more successful. But when viewing the overall success of the two procedures, it becomes obvious that open surgery has the higher success rate. Why this difference?

We are neglecting the weight of the different groups. We find the highest numbers of cases in the group of large kidney stones removed by surgery, and the small stones removed via nephrolithotomy. By putting more weight on the more successful group for surgeries and the less successful group for nephrolithotomies, the overall success rate of each procedure is reversed to the pattern shown at the lower level of detail of stone size.

What can we do?

All of this is quite difficult to show in pie charts, and comparing multiple pies is bad practice anyway (for more info on this statement, check out my TFF EMEA 2019 talk: Be Rational!). It’s better to use bar charts for easier comparison, but then we lose the weight of the sample size again.

Wouldn’t it be simply brilliant to have the advantage of stacked bars showing the percentage, and add to that a variable size of the individual bars depending on the sample size?

You are in luck: there is in fact such a thing! Allow me to introduce you to my favourite chart type – the Marimekko chart.

Please note: The title image of this post has nothing to do with its content whatsoever, in case you were wondering. I was looking for a photo related to paradox, but there were none. Then I tried for a photo about the Simpsons, but only found a very inappropriate graffiti. So I looked for photos featuring the colour yellow and found this. Since I’ve been told by the internet that animals draw more clicks, I went for this one. Also, the alternative would have been a bucket full of lego figure heads, and I found those to be a bit unsettling.