Berkson's Paradox

aka Berkson's Bias · Berkson's Fallacy · Collider Bias

Two unrelated traits appearing inversely linked because the sample only includes people who have at least one of them.

Illustration: Berkson's Paradox
WHAT IT IS

The glitch, explained plainly.

Imagine you only visit restaurants that are either really tasty or really pretty inside — you'd never go somewhere that's both ugly and bad. After a while, you'd start thinking 'Hmm, the restaurants with great food always look ugly, and the pretty ones always have bad food.' But that's only because you never saw all the ugly restaurants with bad food — they're invisible to you because you'd never eat there. The pattern is fake; it's just because of how you picked where to eat.

Berkson's Paradox occurs when restricting observations to a pre-filtered subset of a population creates an illusory negative correlation between two traits that are actually independent or even positively correlated in the general population. The filtering mechanism acts as a 'collider' — a common effect caused by both variables — and conditioning on that collider mathematically induces a spurious inverse relationship. For example, among hospitalized patients, two unrelated diseases may appear negatively associated because a patient without one disease must have had the other disease (or something else) to be admitted in the first place. The paradox is ubiquitous in any setting where observation requires passing through a selection gate based on a combination of attributes, from college admissions to dating pools to celebrity fame.

SOUND FAMILIAR?

Where it shows up.

  1. 01 Noticing that the fast-food places with great burgers always seem to have terrible fries, forgetting that places with both bad burgers and bad fries would never be visited.
  2. 02 Concluding that attractive people on dating apps tend to have boring personalities, not realizing the only profiles getting swiped on are those clearing at least one threshold.
IN DIFFERENT DOMAINS

Where it shows up at work.

The same glitch looks different depending on the terrain. Finance, medicine, a relationship, a team — same mechanism, different costume.

Finance & investing

Credit risk models trained on approved loan applicants may find spurious negative correlations between income and credit history quality, because applicants lacking both attributes were already filtered out during the approval process, distorting the perceived relationship between financial indicators.

Medicine & diagnosis

Hospital-based case-control studies frequently discover false protective associations between unrelated diseases because the sample only includes people sick enough to be hospitalized — the absence of one condition in a patient implies something else caused their admission, creating an artifactual inverse relationship.

HOW TO SPOT IT

Ask yourself…

  • Am I drawing conclusions from a sample that was pre-filtered by some selection criterion that depends on the very variables I'm analyzing?
  • Who is missing from this dataset — what kinds of individuals or cases would never appear in my sample, and could their absence be creating a false pattern?
HOW TO DEFEND AGAINST IT

The playbook.

  • Always ask 'Who is missing from this sample?' before drawing conclusions about correlations — mentally reconstruct the full population including those excluded by the selection process.
  • Draw a simple causal diagram (directed acyclic graph) with your two variables and the selection criterion; if both variables point into the selection variable (a collider), conditioning on it will create a spurious association.
FAMOUS CASES

In history.

  • Joseph Berkson's 1946 Mayo Clinic study found a spurious negative association between diabetes and cholecystitis in hospitalized patients, which was entirely an artifact of studying only hospitalized individuals rather than the general population.
  • During the COVID-19 pandemic, early hospital-based studies suggested smoking might be protective against severe COVID-19, but this was identified as Berkson's Paradox because hospitalization served as a collider variable linking smoking and COVID-19 severity.
  • The 'obesity paradox' in cardiovascular disease — where obese patients with heart disease appeared to have better survival outcomes — has been partly attributed to Berkson's bias arising from conditioning on hospitalization or disease diagnosis.
WHERE IT COMES FROM
Academic origin

Joseph Berkson, 1946. Formalized in his paper 'Limitations of the Application of Fourfold Table Analysis to Hospital Data' published in Biometrics Bulletin, based on his analysis of hospital admission data at the Mayo Clinic. The concept gained wider acceptance after David Sackett's 1979 work providing strong evidence for the paradox's existence.

Evolutionary origin

Ancestral environments demanded rapid pattern detection from whatever information was locally available. Humans evolved to draw conclusions from the samples they could directly observe — their tribe, their territory, their immediate experience — without the statistical sophistication to recognize that their sample might be systematically filtered. In small-group survival contexts, the available sample was often close enough to the full population that this shortcut worked reasonably well.

IN AI SYSTEMS

How the machines inherit it.

Machine learning models trained on non-representative datasets are highly susceptible to Berkson's Paradox. When training data is filtered through a selection process (e.g., only approved loan applicants, only users who engaged, only patients who were hospitalized), models learn spurious negative correlations between features that are actually independent or positively correlated in the general population. This leads to biased predictions, unfair algorithmic decisions in hiring and lending, and recommendation systems that systematically undervalue balanced content. Social media algorithms trained on viral or niche content may falsely learn that popularity and engagement depth are inversely related.

Read more on Wikipedia
FREE FIELD ZINE

10 glitches quietly running your life.

A free field-zine PDF — ten cognitive glitches named, illustrated, with a defense move for each. Plus the weekly Glitch Report on Fridays — one bias named, two spotted in the wild, one defense move. Unsubscribe any time.

EXPLORE MORE

Related glitches.

LAUNCH PRICE

You read about it. Now drill it.

This page taught you the name. The deck turns the name into reflex. 1,100+ swipeable scenarios, 1,100+ defenses, 650+ detection prompts — spaced-repetition Swipe Deck, unlimited Spot-the-Bias Quiz, Defense Playbook, Pre-Flight, My Blindspots, Cheat Sheets, Field Guide e-book. $39.53$59.

Unlock the full kit

Everything below — yours forever. Pay once, use across every device.

Launch price — first 100 readers, $20 off. Auto-applied at checkout.
$59 $39.53
one-time payment · lifetime access
  • All interactive digital cards — search, filter, flip, shuffle on any device
  • Five training modes — Spot-the-Bias Quiz, Swipe Deck, Pre-Flight, Diagnose, Blindspots
  • Curated Lenses + Decision Templates + Defense Playbook
  • Printable Deck PDFs + Field Guide e-book + Cheat Sheets + Anki Export
  • Every future improvement, included
Get the full kit  $39.53

30-day refund · no questions asked

Unlock the full kit

Everything below — yours forever. Pay once, use across every device.

Launch price — first 100 readers, $20 off. Auto-applied at checkout.
$59 $39.53
one-time payment · lifetime access
  • All interactive digital cards — search, filter, flip, shuffle on any device
  • Five training modes — Spot-the-Bias Quiz, Swipe Deck, Pre-Flight, Diagnose, Blindspots
  • Curated Lenses + Decision Templates + Defense Playbook
  • Printable Deck PDFs + Field Guide e-book + Cheat Sheets + Anki Export
  • Every future improvement, included
Get the full kit  $39.53

30-day refund · no questions asked