Javascript disabled.

We use clean, safe Javascript to make our sites easier to navigate. Please consider enabling Javascript for this site.

of Apples and Ounces

22 Oct, 2008 by adam in Blog

How much does this apple weigh?

It started with a simple question to my weekly dinner party last Thursday night: what is the best way, using everyday household objects, to measure the weight of an apple as accurately (and repeatably) as possible? The units of measurement are unimportant, so long as they can be easily calibrated against a known standard (grams, etc).

“That’s easy,” said my business consulting friend, Steve. “Just make a website, and ask as many people as possible how much they think it weighs. As the number of participants approaches infinity, the average answer will be closer and closer to dead-on.”

My incredulity was palpable. “WHAT?” I said, intimating that perhaps my good friend was “off his proverbial rocker.” The absurdity of this notion–that the guesses of infinite random people would yield the correct weight of an apple–was too much for me to let slide. The debate grew heated, and continiued via email for the rest of the following week.

We got together again, and the discussion continued. I began asking everyone I know what they thought of this idea. “Bologna” was the most common response. “Malarky.” “Rubbish.” But because Steve held so vehemently to his postulate–and because my wife, having read statistical data to confirm his claims, tended to agree with him–I was forced to continue to consider whether or not this could possibly contain any thread of truth whatsoever.

My opponent consulted with the statistics guru at his firm, who seemed to believe that there was truth in the notion, so long as certain assumptions about the “perfectly average” apple, and certain other requirements.

We drew graphs, consulted theorems, created equasions, made analogies, cited case-studies, dwelled in the hypothetical, and our emails became increasingly terse. The goal of this blog post will be to communicate our conclusions thus far, and to entertain the potential that further research needs to be done on the topic to achieve satisfactory resolution to our quarrel, lest our next dinner group erupt into the great statistical brouhaha of 2008.

The real issue:

The real question here is whether the ontological weight of an apple has any relationship at all to the perceptual one, at least in so far that there is no inherent bias in all humans that would cause more people to vote too high than too low, or vise-verse. If it can be shown that humans have no inherent bias in the weighing of apples–or more precisely, this particular apple–then it is possible that there is truth in my good friend’s claim that the more people you ask, the more accurate their average response. Take the following diagram for example.

What this diagram illustrates is the simplified curve that would almost certainly result from a survey of this kind. Steve’s assertion is that the mean (or “average”) guess will be closer to the correct weight of the apple as the number of participants approaches infinity, as shown in the next figure.

I do not debate this statistical reality. It is intuitively true that the more people you ask, the more pronounced your mean guess will become, and therefor the more precise the collective guess. However, precision is not the same as accuracy, and herein lies the debate.

The parameters of the debate:

Steve’s “Statistical Prediction of the Weights of Fruits” theorem:

When surveyed regarding the mass (M) of a given fruit (F), the mean (average) answer of participant group (P) will reflect the correct mass (M) of fruit (F) as the size (S) of the group (P) approaches infinity.

Adam’s “Ontology of Individual Fruits” theorem:

Any given fruit (F) at a given point in time (T) will have exactly one specific and exact degree of compliance to gravitational force (G), and that this compliance is solely dependent upon the mass (M) of the fruit (F) at time (T). It should possible through physical experimentation to relate the gravitational force (G) on fruit (F) at time (T) to a given mass (M) expressed in standard units, ie grams. This mass (M) is wholly unrelated to any human action or opinion, except by the pure speculation of the observer based upon previous experience with similar fruits.

Essentially the debate is this: While it is agreed that an individual fruit has exactly one specific and demonstrable mass, and that the result of a survey aggregating the estimates of a large sample size will yield an increasingly precise answer, it is not agreed that the prices answer in question will necessarily correlate with the demonstrable mass of the specific fruit in question. Steve proports that the ontology of the apple is correlated with the perception of said apple, while I am not yet convinced.

The discourse:

One of our first questions to Steve was whether the correct weight of the apple could be found by taking the mean (average) answer of the group, or by taking the median (middle) answer. His reply was that according to the “Central Limit Theorem“, these two values would be the same as the sample size approached infinity, and that therefor the question is irrelevant. Our good friend Tim, also a “dinner group” member, contributed this to the discussion:

Central limit theorem:

http://en.wikipedia.org/wiki/Central_limit_theorem

and a good applet demonstrating it:

http://sciris.shu.edu/thinklets/Math/Statistics/CLT/clt.html

How does this apply to our ongoing discussion about the apple? Well, it means that if infinite people have a lognormal distribution regarding the weight of an average apple (as we’ve noted, it’s unlikely to be truly gaussian, since nobody will say less than zero), and you repeatedly sample N people, M times, to get an estimate of the mean believed weight of the apple (the mean of the underlying lognormal distribution), then those M estimates will be roughly normal in their distribution. As N becomes very large, the distribution of the M estimates of the mean will become tighter and more normal about the true value (the value from the lognormal distribution).

But it needn’t be true that the underlying distribution, which is lognormal (but could be something else non-gaussian), have the same median and mean. So what’s the better estimator? If people are equally likely to deviate on the heavy side as they are on the light side, you should use the median of the distribution, not the mean. But if you want to give each person an equal weight in the estimation, you should use the mean.

Putting this question aside for a minute, let’s assume that the mean is the value we’re after. Suppose the distribution of beliefs regarding the weight of an average apple has a mean of 5 ounces, and a standard deviation of 2 ounces. Based on the central limit theorem, you will have gaussian noise in the estimate of the mean after surveying N people, where the noise decreases as 1/sqrt(N). For N = 6 billion, you would have a noise of sigma/sqrt(6 billion) = 2/77460, or 2.582*10^-5 ounces, or 0.72 milligrams (28 grams per ounce) – 0.000516%. Let’s assume a more common survey size of 1000 — then your noise in the estimate rises to 2/sqrt(1000), or 0.0632 ounces – 1.77 grams – 1.26%. My point here is just that 1/sqrt(N) convergence isn’t all that good — it takes 100 times as many samples to pin down your answer to one more decimal place. In theory, sampling an infinite number of variates seems like a fine idea — but in a practical survey, you run into limits — even if all the people in the world could cast their vote as unbiased estimators, they would likely be trumped by good scientific instrumentation.

~~Tim

This was later summarized by another dinner group participant, Andy:

To reiterate in briefer form–Tim raises a good point, which is the even if people are unbiased estimators of the *true* weight of an apple, the accuracy doesn’t improve as quickly with sample size as one might hope. Each extra digit costs more and more dearly.
Maybe we should get a better consensus as to what accuracy ‘household measurements’ can achieve, then go back to the statistics and see how many people we’d need to compete.

And again, there is no theorem in the world capable of tying the mean guess to the actual weight of any apple or mean-apple, unless we make further assumptions.

-Andy

Connecting the actual and the perceptual

Our good friend Mark wondered what the connection is between the actual and perceptual weight of an object. If we are to believe that the actual weight is to be guessed correctly by a group of people, then we must admit that an assertion is being made about a connection between the reality of the apple, and the human perception of said apple.

Mark’s “Transfer of Ontology” theorem:

Any fruit (F), having a specific mass (M) (according to Adam’s “Ontology of individual fruits theorem”), can only be measured accurately by some means that is in some way connected or correlated with the mass (M) of fruit (F).

It may seem obvious that the opinion of a group is only relevant in so far as said group is educated about the topic on which it opines. Guessing the weight of an apple is a comparatively easy question, but say we were to ask how many ping-pong balls will fit inside a Boeing 747? This is a substantially more difficult question, but it has been shown statistically that large surveys do a pretty good job of answering it anyway. But what about an even harder question: “How many grains of sand exist in the world.” The hugeness of this question is beyond the grasp of most any human. But what about something more mundane: “I’ve used a computer to generate a random number between 1 and 100. What is the number?” Am I to believe that an infinite group of people will answer even this question correctly?

The Record Player Analogy

I think it is obvious that the answer is “no,” and I think that Mark’s theorem explains why: there is no fundamental connection between the actual number selected by the computer and the guesses made by the survey sample. A simple illustration–again, Mark’s idea–is that of a record player. In order for a record player to work, a needle must make contact with a bumpy surface. When bumps in the surface are transferred mechanically into the needle, the resulting vibration is amplified through speakers and played back as coherent sound (or incoherent noise, depending on your musical taste) that can then be perceived through the ears.

The brilliance of this analogy is in its depth of insight. If there is no connection between the actual fact in question (the random number generated by our computer) and the perceptual (the guesses of the survey sample), then we have a situation analogous to a record player with no needle connecting the bumps on the surface of the vinyl (the fact) with the amplified signal heading to the speakers and out to the ears (the survey result). With no needle, the record won’t play, no matter how much you turn up the volume (or how many people you survey). No “central limit theorem” or any other statistical trick will fix this problem.

Conversely, if a record player has a nice, sharp needle, it will pick up even the smallest detail etched into the surface of the vinyl, and the amplified sound of Stravinsky will come pouring through the speakers in glorious detail. When shown an image of my five-fingered hand, and asked how many fingers I have on said hand, the survey results will undoubtedly yield “5″, as loud and clear as Coltrane’s left-channel performance on Giant Steps. This is because there is a strong, “sharp” connection between the actual (the image of my hand), and the perceptual (the survey sample’s count of the fingers on my hand).

However, lets say that your record player has a needle, but the needle is very dull. The more dull the needle, the more tenuous the connection between the actual (the bumps on the vinyl) and the perceptual (the sound coming from the speakers). The question of how many grains of sand exist on earth is an example of a record player with an extremely dull needle. The audience surveyed intuitively knows the rough size and shape of a grain of sand, as most people have experience with it. The size of earth, however, and the percentage of earth that is filled with the minuscule particles, is a mystery to most anyone. I hypothesize that the more familiar the masses are with the subject in question, the better their collective guess will be. With a dull needle, it doesn’t matter how much you turn up the volume, the music will be obscured.

“Am I right or wrong?”

After two weeks of agonizing over this question, it is now my opinion that in the case of the apple it is quite likely that surveying a large sample of people is likely to yield an answer that is in fact very close to correct. The main reason for my change of heart is the realization that a) people are generally fairly well informed as to the general weight of an apple (at least compared to the general public’s knowledge of ping-pong balls in 747′s), and b) that it is unlikely that humans are likely to have any particular consistent bias with regard to the weight of apples.

This is not, however a general admission that survey results somehow magically yield “correct” answers to all questions. People can only estimate weights and measures because they have experience with the objects in question (like a record player with a dull needle, but a needle none-the-less). Ask an infinite crowd how many water molecules are in a particular cumulonimbus cloud, and I’m pretty sure the result is likely more-or-less random noise.

The reality is that asking 6.7billion people how much an apple weighs does not, in fact, tell you how much the apple weighs; it only tells you how much 6.7billion people *think* an apple weighs. This is an obvious and incredibly important distinction. I said that I have no reason to believe that humans have a particular bias with regard to the weight of apples, but then again, they might. And if they do, no amount of mathematical hocus-pocus can correct the survey data. The survey data would be incredibly interesting none-the-less, because it would reveal a bias common to most humans. However, as a scientific instrument for the measurement of weights and measures, the survey is unreliable at best.

Broad-reaching implications and opinion-based speculation

The beauty of this little exploration is that it helps to clarify the circumstances in which surveys (and quantitative study in general) are appropriate in business cases. In matters of opinion on well-known and mature subjects, quantitative study is likely to yield meaningful results. On the other hand, in matters of innovation and the creation of new and never-before-seen technologies, a crowd is only as wise as its collective experience.

If you were to survey everyone on earth thirty years ago about the relevance of an incredibly powerful automatic computing machine as a home appliance, I think the results would be far from predicting what was to come. The average person of 1978 had no conception of what the word “computer” would come to mean, much less the myriad daily uses it would have. It’s a good thing Apple didn’t use surveys to validate its ideas; it surely would never have invented the PC.

about adam:
Adam O'Hern is an industrial design consultant specializing in visual brand languages, and has designed products ranging from laptops to power tools, classroom toys to bathroom fixtures, and robots to lint rollers. He has published with 3DWorld Magazine, CGTuts+, and Luxology, and works with Josh Mings of SolidSmack.com on EngineerVsDesigner.com.

Leave a Reply

You must be logged in to post a comment.