Insensitivity to Sample Size

aka Sample Size Neglect · Law of Small Numbers · Sample Size Fallacy

Treating small samples as equally reliable as large ones when judging probabilities, ignoring how much randomness small samples contain.

WHAT IT IS

The glitch, explained plainly.

Imagine you flip a coin 4 times and get 3 heads. You might think 'wow, this coin lands on heads 75% of the time!' But if you flipped it 1,000 times, you'd get much closer to 50/50. We forget that tiny amounts of information can look very different from the big picture, like tasting one grape and deciding the whole vineyard is sour.

Insensitivity to sample size occurs when people fail to appreciate that smaller samples are inherently more variable and less reliable than larger ones, leading them to draw equally strong conclusions regardless of how many data points underlie a result. This bias manifests as treating a finding from 10 observations with the same confidence as one from 10,000 observations. People intuitively expect any random sample, no matter how small, to closely mirror the overall population — a misconception Tversky and Kahneman called 'belief in the law of small numbers.' The result is systematic overconfidence in patterns detected in limited data and a failure to demand more evidence before drawing conclusions.

SOUND FAMILIAR?

Where it shows up.

01 A school principal notices that the top-performing school in the district is a small rural school with only 40 students. She launches an initiative to break large schools into smaller units, convinced that small size causes excellence — without realizing that the worst-performing school in the district is also a small rural school with 38 students.
02 A sports fan watches a rookie quarterback complete 8 of 10 passes in his first game and declares him the best passer in the league, even though veteran quarterbacks have completion rates calculated over thousands of attempts.
03 A pharmaceutical company runs a pilot study with 12 patients and finds that 75% improved on the new drug. The CEO fast-tracks the drug for mass production, bypassing the planned 2,000-patient trial, saying 'The results speak for themselves.'
04 A hiring manager interviews three engineers from the same university. Two of them were outstanding, so she instructs HR to prioritize all candidates from that university, treating a sample of three as definitive evidence of institutional quality.
05 Two hospitals record birth data. The large hospital delivers 45 babies a day; the small one delivers 15. A researcher asks which hospital is more likely to record days where over 60% of births are boys. A statistician confidently answers 'both equally likely,' reasoning that the underlying probability is the same at both hospitals.

IN DIFFERENT DOMAINS

Where it shows up at work.

The same glitch looks different depending on the terrain. Finance, medicine, a relationship, a team — same mechanism, different costume.

Finance & investing

Investors frequently evaluate mutual funds or stock-picking strategies based on a few months or quarters of returns, treating short-term outperformance as evidence of genuine skill rather than recognizing that small time samples produce high variability and that streaks are expected by chance alone.

Medicine & diagnosis

Clinicians may draw strong conclusions about a treatment's efficacy from a handful of patients they've personally treated, overriding large randomized controlled trials. Similarly, patients dismiss large-scale safety data in favor of a few anecdotal adverse reactions reported by people they know.

Education & grading

Teachers form confident judgments about a student's ability from one or two early assignments, and school administrators rank programs or curricula based on test scores from very small cohorts where random variation dominates actual quality differences.

Relationships

People judge a new partner's character based on a few early interactions — a small sample of behavior — and form rigid expectations that resist updating even as more information becomes available over time.

Tech & product

Product teams run A/B tests with insufficient sample sizes and ship features based on early results that appear statistically significant, only to see the effect vanish when rolled out to the full user base. UX researchers draw design conclusions from usability tests with 3-5 participants without acknowledging the limits of such small samples.

Workplace & hiring

Performance reviews are heavily influenced by a few recent memorable incidents rather than a full year of work. Hiring committees generalize about entire universities, bootcamps, or previous employers from encounters with just a few candidates.

Politics Media

Polls with very small sample sizes are reported with the same authority as large-scale surveys. Voters and commentators treat a handful of town hall reactions or viral social media posts as representative of public opinion at large.

HOW TO SPOT IT

Ask yourself…

How large is the sample I'm basing this conclusion on — and would I feel equally confident if the sample were half this size or double it?
Am I treating a handful of personal experiences or anecdotes as if they were a large, controlled study?
If someone showed me the same percentage from a much smaller or much larger group, would I react differently?

HOW TO DEFEND AGAINST IT

The playbook.

Always ask 'What is N?' before drawing conclusions from any statistic — make sample size your first question, not an afterthought.
Use the 'shrink the sample' test: mentally reduce the sample to an absurdly small number (e.g., 2 people) and check if your confidence changes. If it does, you're sensitive to sample size — now calibrate properly for the actual N.
Convert percentages back to raw numbers: '75% success rate' feels different when you learn it means 3 out of 4 people versus 750 out of 1,000.
Apply a 'confidence interval' mindset: remind yourself that small samples have wide error bars, and the true value could be far from what you observed.
Seek out base rates and large-scale studies before trusting small-sample anecdotes or pilot results.

FAMOUS CASES

In history.

Howard Wainer and Harris Zwerling demonstrated that both the highest and lowest kidney cancer rates in U.S. counties occurred in small, rural counties — not because of environmental factors but because small populations produce more extreme statistical fluctuations.
The Gates Foundation's small-schools initiative invested heavily in breaking up large schools into smaller ones after observing that top-performing schools tended to be small, without recognizing that the worst-performing schools were also disproportionately small due to sampling variability.

WHERE IT COMES FROM

Academic origin

Amos Tversky and Daniel Kahneman, 1971 ('Belief in the Law of Small Numbers,' Psychological Bulletin, 76(2), 105–110); further elaborated in their 1974 paper 'Judgment under Uncertainty: Heuristics and Biases' in Science.

Evolutionary origin

In ancestral environments, humans rarely encountered formal statistical data. Decisions were made from small, personally observed samples — a few encounters with a predator, a handful of foraging trips to a location. Treating these small observations as representative was often adaptive because the cost of waiting for a larger sample (e.g., more predator encounters) could be fatal. Quick generalization from limited experience was a survival advantage when data collection itself was dangerous.

IN AI SYSTEMS

How the machines inherit it.

Machine learning models trained on small or unrepresentative datasets can produce overfit predictions that appear highly accurate during training but fail to generalize. Practitioners who evaluate model performance on small test sets may overestimate accuracy. In recommendation systems, items with few ratings can receive extreme average scores (very high or very low) that disproportionately influence recommendations, a problem known as the cold-start problem.

10 glitches quietly running your life.

A free field-zine PDF — ten cognitive glitches named, illustrated, with a defense move for each. Plus the weekly Glitch Report on Fridays — one bias named, two spotted in the wild, one defense move. Unsubscribe any time.

EXPLORE MORE

Related glitches.

#89

Representativeness Heuristic

LAUNCH PRICE

Train against your blindspots.

50 cards are free to preview. Buyers unlock the rest of the deck plus the interactive training — Spot-the-Bias Quiz unlimited, Swipe Deck with spaced repetition, My Blindspots, Decision Pre-Flight, the Printable Deck + Cheat Sheets, and the Field Guide e-book. $29.50$59.