Saturday, April 26, 2008

Mysteries of the Bell Curve Revealed

Behold the famous Bell Curve (aka 'the normal distribution'), loved by some, loathed by many, but indispensable to social sciences like psychology. The Bell Curve is what you get when you graph the distribution of things like people's height, weight, mood, IQ, extroversion, exam scores, affinity for chocolate, willingness to vote Labor, or indeed almost anything in which people vary. Again and again the same pattern appears: some people are on one extreme (e.g., extremely tall), some people are on the other extreme (e.g., extremely short), but most people are relatively average.

Why should this be? Why should such a disparate range of variables all distribute in this way? Is this some sort of divine signal? Or perhaps a government conspiracy? How bizarre that exactly the same shape should come up again and again!

Well, there's actually a good reason for the ubiquity of the Bell Curve. I only realised this a couple of years ago when I saw a nifty little exhibit at the Questacon National Science and Technology Center in Canberra. The exhibit was quite simple. Mounted on a wall, inside a glass case, were a series of pegs arranged vertically in a triangle. Visitors to Questacon were asked to drop a ball into a small shoot at the top of the case, just above the top most peg. The ball would hit the peg and bounce either left or right, then fall and hit another peg and bounce either left or right, and so on, ricocheting all the way to the bottom.

The pegs were carefully arranged so that at each level the ball had a 50/50 chance of falling either to the left or to the right. And so, each time a ball was dropped, it would take a different path through the pyramid of pegs. When the ball reached the bottom, it would fall into one of several slots lined up along the bottom; sensors in these slots, wired up to a computer, recorded the end point of each ball's journey.

The computer kept track of the outcomes. A running tally of the number of balls that had fallen into each slot was presented on a little screen in the form of a bar graph: the greater the tally for a given slot, the higher its bar.

What kind of shape do you think this graph showed after tens of thousands of ball drops? That's right, a bell curve!

Slots 1 and 9 had the smallest tallies, slots 2 and 8 had slightly more, 3 and 7 more again, 4 and 6 had even more, but slot 5 had he largest tally of all. And from this exhibit it's not dificult to see why.

For the ball to make it to slot 9 everything has to go right...literally! (On each peg the ball has to fall to the right). Thus, there's only one path to slot 9. And similarly, for the ball to reach slot 1 everything has to go left, so there's only one path to slot 1. But for slots closer to the center there are multiple routes that the ball can take. In fact, the closer a slot is to the center, the greater the number of possible routes, and the more frequently the ball reaches it. The consequence of this is a lovely bell curve.

So how does this relate to other variables?

Consider the case of an exam. How well a given student does on the exam depends on many different factors. For example:

- How much effort was put into studying
- How intelligent the student is
- How confident the student is on the day
- How much sleep the student got the night before
- Whether the student gets to the exam on time
- Whether the student is sitting next to someone who will distract them in the exam


For a student to get the highest possible score on the exam everything has to go right. That is, all of the factors that influence exam score have to go favorably: the student has studied hard, is intelligent, is confident, arrived on time, etc. And conversely, for a given student to get the worst possible score everything has to go wrong. For most students, however, some things will go right and some things will go wrong (in various combinations). Thus, most students will obtain a relatively average score.

So with exams, as with the pegs, we can see that there are more paths to being average than to being extreme, resulting in a bell curve. The same is true for other variables. For example, there are many different things that influence height (genes, nutrition, etc.) and so there are many more paths to being of average height than there are to being either extremely tall or short. Similarly, there are many different influences on people's love of chocolate (past experiences, advertising etc.) and so ratings of chocolate admiration should also distribute as a bell curve.

In other words, the bell curve is the distribution you get when there are multiple independent influences. And so the ubiquity of the Bell Curve in social science is not really that mysterious after all. The bell curve, my friends, is simply the signature of complexity. No wonder it pops up everywhere.


Here's a wee experiment you can do at home. Take 5 coins and toss them all together. Count up the number of heads and write it down. Repeat this 30 or more times, each time writing down the number of heads that come up.

When you're done, count up the number of times you got 5 out of 5, 4 out of 5, 3 out of 5, 2 out of 5, and 1 out of 5. Now draw a bar graph. What does the shape of the graph look like?

If you're brave, try doing this with 10, 20, or 30, coins at a time. If you're smart, just do it in Excel.


Anonymous said...

Which is why nobody can be a complete A-hole!

How does habit fall in to this theory?

Mark said...

Well, at least it's unlikely for a person to be an A-hole in all respects.

How does habit fall in to this theory? Hmmm...not sure...what do you think Anon?

Anonymous said...

Its just that humans have one influence that a ball doesnt. Preconditioning. This will change the element of randomness surely?

Mark said...

I think the theory still holds up. The bell curve comes about when there are multiple independent influences, it doesn't matter whether they are random or not. The point of the falling ball exhibit is that where the ball lands is the product of multiple influences (i.e., how it bounced on the first peg, how it bounced on the second peg, etc.) But the same principal applies to influences on human behaviour that are non-random.

Let's say we want to find out how much people are willing to pay for chocolate. We ask 1000 different people how much they'd be willing to pay for a bar of chocolate. Now there's a number of non-random things that influence their answer:

- the memory of the first time they had chocolate and how much they loved it
- the memory of a time when they ate too much chocolate and it made them sick
- their impression of that brand of chocolate
- whether they've just had desert
- if they're on a diet
- how much money they have available to spend
- genetics
- personality
- precondition

and thousands and thousands of other influences. None of these are truely random, they've all come about for a reason, but these influences are at least partially separate from one another. Some of these influences are increasing peoples willingness to pay for chocolate, some are decreasing willingness to pay.

So in our group of 1000 people, the person who is the most willing to pay for chocolate will be the one for whom all of the influences are pointing in a choclate paying direction (i.e., he has money, he loves choclate, he hasn't had desert). And conversely the person who is the least willing is the person for whom all of the influences are pointing in a non-cholcate paying direction.

However, there are relatively few combinations of influences that can lead to these extremes (just like there is only one path for the ball to take to the outside of the triangle). Yet, there are many paths to being in the middle. For example, to be of average willingness to pay for chocolate you just need a mix of some things in favour of paying and some things against, it doesn't matter what combination exactly.

So it doesn't matter if the influences are random, genetic, conditioned, rational, learned, or beat into you with a stick, the moral of the story is that where you have multiple influences, you get a bell curve.

I'm thinking about making a little video on this topic and what I might do is do an animation with a person as a ball making a decision at each peg about which way to turn.

Lee said...

Does it change if it is a square peg? (p<.05, of course)

Anonymous said...

Cant their be a question or an action that is almost guaranteed to result in all 1000 people reacting the same way? Im thinking of that guy who can put people under hypnosis in seconds. Much of his power is knowing exactly how people will react by using subliminal messages and misdirection. He knows exactly how the brain reacts to certain methods. Therefore is he does this to 1000 people there is no bell curve as they all react the same.

Am i just being argumentative and clutching at straws here?

Mark said...

Don't underestimate how much of Derren Brown's stuff is stage magic and editing. But hypnotiseability varies greatly between people, and i dare say that hypnotiseability would be distributed as a bell curve.

Now, you can purposely go out and select people who you think will respond a certain way, so that you don't get a range of responses. And I guess the answer to the question "What is 1+1" is unlikely to vary much between people. But there's usually at least some variance across people in most things.

Mark said...

I'm planning on doing a post on subliminal priming in the future too

Anonymous said...

Ooh, sounds like a good Podcast topic!

Anonymous said...

You right when it comes to humans vs a nonliving object. There is a lot to consider when your trying to obtain statistical values from them. An object wont have that many inddependent values that can influence the outcome your research whereas an individual's habits/ ticks can greatly influence data. If a person has low self-esteem they are more likely to second guess themselves on a test. Also comparing those who are more easily stressed. Also you could take into account those individual who do well during certain tests such as multiple choice vs not doing well on cognitive thinking (short answer) questions. Every factor should be taken into account. Of course with humans you are more likely to have outliers that throw off the curve and then you have to evaluate whether or not they are significant or not. there is alot of depth that goes into the bell curve.