Congruence Bias

aka Congruence Heuristic · Direct Testing Bias

Testing a hypothesis only by looking for evidence that confirms it, while never testing alternatives that could disprove it.

WHAT IT IS

The glitch, explained plainly.

Imagine you think your toy car only goes fast on the carpet. So you keep testing it on the carpet — and sure enough, it goes fast every time! You feel really smart. But you never tried the kitchen floor, the sidewalk, or a ramp. If you had, you'd have found out it goes fast on everything — the carpet had nothing to do with it. That's congruence bias: you only test your one guess and never try anything that could show you're wrong.

Congruence bias describes the systematic preference for direct testing of one's initial hypothesis — designing experiments or asking questions whose affirmative answers support the favored explanation — while failing to pursue indirect tests or explore competing explanations. Unlike the broader confirmation bias, which encompasses selective attention, interpretation, and memory, congruence bias operates specifically at the level of test selection and experimental design. People default to asking 'Would I see this if my idea is right?' rather than the more diagnostic 'What would I see if my idea is wrong?' This narrowing of the search space means that even when evidence consistently 'confirms' the hypothesis, the person may never discover that a simpler, broader, or entirely different explanation fits the data equally well or better.

SOUND FAMILIAR?

Where it shows up.

01 A software engineer notices intermittent crashes and suspects a specific memory leak in module A. She writes twelve unit tests that all exercise module A in various ways, and each test passes cleanly. She reports 'Module A is fine — it must be something else,' but she never wrote a single test for modules B, C, or D, any of which could be the actual source of the crash.
02 A teacher notices that students perform better when she uses visual slides. She runs the next five lessons with visual slides and confirms improved scores each time. She concludes slides are the key factor, but she never tested lessons with no slides, different teaching styles, or different times of day — any of which might equally explain the improvement.
03 A marketing analyst believes a drop in sales is due to a competitor's new pricing. She pulls report after report comparing their prices to the competitor's, finding overlap each time. She never investigates whether a seasonal trend, a supply chain disruption, or a change in her company's own ad spend might be driving the dip.
04 A physician suspects a patient's fatigue is caused by thyroid dysfunction. The thyroid panel comes back borderline, which she considers partially supportive. She orders a second thyroid test with a different methodology. She never orders a sleep study, checks for anemia, or screens for depression — alternative diagnoses that could also explain every symptom.
05 A data scientist builds a fraud-detection model and tests it against a dataset of known fraudulent transactions, finding high accuracy. Satisfied, he deploys it. He never tested the model against tricky non-fraudulent transactions that share surface features with fraud, which would have revealed a high false-positive rate and shown that a simpler rule set could outperform his model.

IN DIFFERENT DOMAINS

Where it shows up at work.

The same glitch looks different depending on the terrain. Finance, medicine, a relationship, a team — same mechanism, different costume.

Finance & investing

Analysts who believe a stock is undervalued tend to gather more financial metrics that support the buy thesis — favorable P/E ratios, revenue growth — while neglecting to seriously investigate bear-case indicators like rising debt, insider selling, or declining market share that would test the thesis from the opposite direction.

Medicine & diagnosis

Clinicians who form an early diagnostic impression tend to order tests that would confirm that specific condition rather than tests designed to differentiate between multiple plausible diagnoses. This can lead to delayed identification of the actual illness, especially when the initial hypothesis is a more common condition that shares symptoms with rarer alternatives.

Education & grading

Teachers who believe a particular instructional method works tend to keep applying and measuring it under favorable conditions rather than systematically varying conditions or trying rival approaches, making it impossible to determine whether the method is truly superior or whether other factors drive student improvement.

Relationships

When someone suspects their partner is losing interest, they tend to scrutinize interactions for signs of distance — late replies, shorter conversations — without equally attending to evidence of continued investment, or considering that the partner may simply be busy or stressed for unrelated reasons.

Tech & product

Product teams who hypothesize that a new feature drives engagement tend to A/B test only that feature versus a control, without testing whether a different feature, a UX improvement, or simply better onboarding would produce equal or greater engagement gains.

Workplace & hiring

Hiring managers who believe a candidate is strong tend to ask interview questions that let the candidate demonstrate strengths rather than probing for weaknesses or comparing the candidate's approach against what alternative candidates might offer in the same scenarios.

Politics Media

Investigators and journalists who develop an early theory about a political scandal tend to pursue leads and sources that corroborate that narrative while neglecting to interview dissenting voices or examine evidence that might support an entirely different explanation for the same events.

HOW TO SPOT IT

Ask yourself…

Am I only looking for evidence that my current explanation is right, or am I also looking for evidence that would prove it wrong?
Have I seriously considered at least two alternative explanations and designed a test that would distinguish between them?
If someone with the opposite hypothesis ran this same test, would the result change their mind — or would it confirm their view too?

HOW TO DEFEND AGAINST IT

The playbook.

Before testing, force yourself to write down at least two alternative hypotheses and design one test that would distinguish between them.
Use Baron's diagnostic question: 'How likely is a yes answer if I assume my hypothesis is false?' If a yes is likely either way, the test is uninformative.
Adopt the 'murder board' technique: assign someone (or yourself) the explicit role of attacking your hypothesis and proposing rival explanations.
Practice indirect testing: instead of asking 'Does my idea work?', ask 'What would I expect to see if my idea is wrong?' and look for that.
In teams, implement structured analytic techniques like Analysis of Competing Hypotheses (ACH), where evidence is explicitly evaluated against multiple explanations simultaneously.

FAMOUS CASES

In history.

The Challenger space shuttle disaster (1986): engineers repeatedly tested whether O-ring erosion was within acceptable limits under their existing model, rather than testing the alternative hypothesis that cold temperatures fundamentally changed O-ring behavior — a failure of indirect testing that contributed to the launch decision.
Early COVID-19 testing protocols (2020): several countries initially tested only patients matching a narrow travel-history profile, repeatedly confirming that travelers were infected while failing to test community members without travel history — delaying recognition of widespread local transmission.

WHERE IT COMES FROM

Academic origin

The concept originates from Peter Wason's 2-4-6 task experiments (1960), which demonstrated the positive test strategy in hypothesis testing. The term 'congruence heuristic' was formally introduced by Jonathan Baron, Jane Beattie, and John C. Hershey in their 1988 paper 'Heuristics and biases in diagnostic reasoning: II. Congruence, information, and certainty' published in Organizational Behavior and Human Decision Processes.

Evolutionary origin

In ancestral environments, rapid hypothesis formation and action were more survival-relevant than exhaustive testing. If a rustling bush was assumed to be a predator, running away (confirming the threat hypothesis through direct avoidance) was safer than pausing to systematically test whether it was wind, a small animal, or a predator. A strategy of 'test the most salient hypothesis quickly and act on positive evidence' conserved time and cognitive resources when false negatives were far costlier than false positives.

IN AI SYSTEMS

How the machines inherit it.

Machine learning models can exhibit congruence bias when training pipelines evaluate model performance only on data distributions that match the original hypothesis about what the model should learn, without adversarial testing or out-of-distribution evaluation. Hyperparameter tuning that optimizes for a single metric on a single validation set repeatedly confirms the model's adequacy without probing failure modes, edge cases, or alternative architectures that might reveal the model's limitations.

10 glitches quietly running your life.

A free field-zine PDF — ten cognitive glitches named, illustrated, with a defense move for each. Plus the weekly Glitch Report on Fridays — one bias named, two spotted in the wild, one defense move. Unsubscribe any time.

EXPLORE MORE

Related glitches.

Choice & Decision Architecture

Overconfidence Effect

Ego & Self-Preservation

#203

Motivated Reasoning

Belief & Conviction

LAUNCH PRICE

Train against your blindspots.

50 cards are free to preview. Buyers unlock the rest of the deck plus the interactive training — Spot-the-Bias Quiz unlimited, Swipe Deck with spaced repetition, My Blindspots, Decision Pre-Flight, the Printable Deck + Cheat Sheets, and the Field Guide e-book. $29.50$59.