The problem with the mean

The problem with the mean

Here is an outlier for you.

Jeff Bezos 

When it comes to income, Mr Bezos has few peers. Only a handful of individuals earn more in a second than most regular folks do in a month.

According to the Australian Bureau of Statistics, who conduct regular and rigorous surveys, Australians average weekly ordinary time earnings for full-time adults (seasonally adjusted) was $1,737 for May 2021 across a workforce of 13 million people.

What would happen to the average weekly income of Australians if Mr Bezos went down under and we added his weekly ‘wage’ to the calculation of the mean?

The average weekly wage would grow by $116 or roughly 7%

A hundred bucks a week more from one outlier.

There is no rational reason for an individual to have such obscene wealth, and we might get sidetracked by the savage inequality. Still, the example shows what happens to averages when there are outliers.

They get distorted.

Add Mr Bezos to the Australian workforce, and average income goes up materially even though workers would not see a cent of it.

Suppose Mr Musk and Mr Gates also came down under.

The average weekly wage goes up by over 15%, thanks to three outliers in 13 million. 

Try it this way

Suppose we select 100 males at random from a population of college students and measure their height. In that case, we could assume that the average represented reasonably well the height of a typical male in college.

But suppose this was a college with a strong basketball program, and the sample included three of the tallest members of the team. We have outliers again.

The challenge is to know when an outlier is possible — there is a basketball program — and so is part of the population that will now be, on average, taller as a result and when an outlier is improbable. Unlikely outliers add skew to the data and whilst still statistically sound, can make for shaky conclusions. 

The statistical rationale is that very few variables in real life are distributed normally. They are skewed, typically by a few large outliers, so that the mean is larger than the modal (most common) value and the median value (the middle value in a sequence of numbers).

Medians and modes present one of the solutions to the problem. The average is only one measure of central tendency; the middle of the distribution. It is helpful to use the others, especially the median (the midpoint of a frequency distribution of observed values or quantities, such that there is an equal probability of falling above or below it) or the mode (the most frequent value in a frequency distribution).

There is a simpler solution

Statisticians, politicians, and the media are fond of describing the average with a mean. They use them all the time to convey information.

The mean is where you add up all the values and then divide by the number of values.

But be careful. There are outliers everywhere, and they tend to make means larger.

The simple solution is to know about outliers and be cautious of means, primarily when the reporter benefits from it being large.


Hero image from photo by Charles Deluvio on Unsplash