It turns out this is not the case. Sample size is just as important as the representativeness of population. Make a mistake in sample size, or make a mistake in representativeness of your population, and you have results that are compromised.
Most people, in my experience, do not understand what statistical significance means. In a nutshell, it tells us the probability that the result, assuming the null hypothesis, is due to random chance.
In Marshall's case, for instance, it tells us the likelihood of getting the result assuming each rater is
randomly choosing the odd man out beer sample.
The result, when statistically significant at, say, the .05 level, tells us that if people were choosing randomly we'd only see such a result less than 5 percent of the time. When the result, due to chance, is very unlikely (say a P value of .001), we're saying we no longer believe that the result is due to chance, but more likely represents a real difference.
***************
As you may have surmised, I teach this subject. One example I use to get this across is to imagine a local city council race. Martha Jones, candidate for city council, says she's going to win.
You take a random sample of 20 likely voters (small sample, but it helps illustrate the point).
How few Martha supporters would you have to see in a random sample of 20 likely voters for you to not believe Martha's claim?
If you had 10 out of 20, you'd probably say that this was consistent with her claim, for if she truly had 51 percent support in the voting population, 10 out of 20 in a sample of 20 would be....reasonable to expect.
But what if you had NO voters in favor of Martha in your sample of 20? Would you still believe her claim? I would not--I'd expect, in a sample of 20, to get at least SOME voters favoring her if she truly was destined to win.
So then, when I'm teaching this, I'll ask students at what threshold they'd start to doubt Martha's claim that she's going to win: is it 1 vote in favor of Martha? Two votes? Perhaps 3 or 4 or 5 votes?
Obviously each of us has a different sense of how few is enough to doubt her claim of victory. That's why we use statistical significance to make those determinations, rather than our gut. Significance allows an objective standard that others can reproduce, rather than a gut feeling the basis of which is impossible to know.
**************
I use 20 voters because I can also have students flip coins to simulate random votes for Martha or against her. When I do this I typically get 7, 8, 9, 10, 11, 12, 13 votes for or against her, plus the occasional 6 or 14 votes, and even 5 or 15. It shows how easily in a sample of 20 drawn from a population split 50-50 can be far away from the expected value of 10.
But the kicker is, I'll do this 10 times. Students are flipping coins, I'm joking with them that this is what their tuition buys, i.e., flipping coins. After doing it 10 times I have 10 samples of 20--but also one sample of 200 flips. The variation in percentages evident in samples of 20 disappears in a sample of 20. It's almost always within a couple percent of 50 percent.
It helps them see the effect of sample size on results--and why they're paying tuition to flip coins.