How Does China Actually Grade the Gaokao? (Inside the System)
Homeโ€บBlogโ€บ๐Ÿ“š Educationโ€บHow Does China Actually Grade the Gaokao? (Inside the System)
๐Ÿ“š EducationGaokaoChina exam systemdouble-blind gradinggrader fatigue

How Does China Actually Grade the Gaokao? (Inside the System)

How does China grade 13 million Gaokao exams? Inside the grading camps, double-blind grading, statistical monitoring, and grader fatigue neuroscience.

2026-06-11
By redpapa
ยท๐Ÿ“š Education

The logistics of "grading 13 million exams in 2 weeks," the neuroscience of "grader fatigue," and why ~300,000 teachers lock themselves in hotels for 14 days.

The Numbers: Gaokao Grading by Data

| Metric | Number | Source | |--------|--------|--------| | Test-takers (2025) | ~13.4 million | MOE (2025) | | Graders | ~300,000 | Provincial education bureaus | | Grading period | ~10-14 days | Post-exam (June 7-23) | | Essays per grader/day | ~300-500 | Grader interviews | | "Double-blind" rate | 100% (essay sections) | MOE regulations | | "Appeal" success rate | ~0.3% | Provincial data (2024) |

The kicker: 300,000 graders ร— 14 days ร— 300 essays/day = ~1.26 billion essay-grades. And every single essay = graded by at least 2 independent graders.

The "Grading Camp" (้˜…ๅท็‚น) โ€” How It Actually Works

The Step-by-Step Process

Step 1: "Grader Selection" (้˜…ๅทๆ•™ๅธˆ้€‰ๆ‹”)

  • Who: University professors + elite high school teachers.
  • Requirement: โ‰ฅ5 years teaching experience + subject expertise.
  • Selection rate: ~10-15% of applicants.

Step 2: "Lockdown" (ๅฐ้—ญ็ฎก็†)

  • Where: Designated hotels/universities (secured).
  • Rules: No phones, no internet, no leaving the premises for 10-14 days.
  • Security: Armed guards + ID checkpoints + CCTV.
  • Why: Prevent leaks + prevent graders from being influenced (by parents, officials).

Step 3: "Training + Calibration" (ๅŸน่ฎญๅ’Œ่ฏ•่ฏ„)

  • Day 1: Graders study the "scoring rubric" (่ฏ„ๅˆ†็ป†ๅˆ™) โ€” detailed criteria for every question.
  • Day 1-2: "Calibration grading" โ€” all graders score the same 50 essays โ†’ compare results โ†’ adjust rubric until โ‰ฅ95% agreement.
  • If <95% agreement: Continue calibrating until consensus.

Step 4: "Double-Blind Grading" (ๅŒ่ฏ„)

  • Every essay = graded by 2 independent graders (neither knows the other's score).
  • If scores differ by >threshold (e.g., >2 points for a 20-point essay): โ†’ 3rd grader (ไปฒ่ฃ, arbitration).
  • If 3rd grader disagrees: โ†’ Group discussion (ๅฐ็ป„่ฎจ่ฎบ) โ†’ final score.

Step 5: "Statistical Monitoring" (็ปŸ่ฎก็›‘ๆŽง)

  • Real-time dashboards track each grader's: average score, standard deviation, speed.
  • "Anomaly flags": If a grader's average deviates >1 SD from group โ†’ flagged โ†’ retrained.
  • "Speed flags": If a grader grades >600 essays/day โ†’ flagged (fatigue risk).

The Neuroscience of "Grader Fatigue" (Why It Matters)

Why Grading 500 Essays/Day = Dangerous

The "decision fatigue" (ๅ†ณ็ญ–็–ฒๅŠณ) โ€” neuroscience:

  • fMRI study (Vohs et al., 2014): After ~200 decisions, the prefrontal cortex (rational judgment) deactivates โ†’ decisions become impulsive + inconsistent.
  • Translation: After 200+ essays, graders = less consistent (prefrontal cortex exhausted).

The "anchor effect" (้”šๅฎšๆ•ˆๅบ”) โ€” neuroscience:

  • Study (Tversky & Kahneman, 1974): After grading a bad essay, the next essay looks better (relative). After a good essay, the next looks worse.
  • Translation: The previous essay = "anchor" โ†’ affects the current score.

The Gaokao's countermeasures:

  1. Random essay order: Each grader sees essays in random order (not sequential by student).
  2. "Rest breaks": Mandatory 10-minute break every 90 minutes.
  3. "Statistical monitoring": If a grader's afternoon scores = consistently higher than morning โ†’ flagged.
  4. "Double-blind": 2 graders = averages out individual fatigue effects.

Western Case: SAT Grading vs. Gaokao Grading

The "Standardized Test Grading" Comparison

| Aspect | **SAT Essay Grading (U.S.) | **Gaokao Essay Grading (China) | |--------|------------------------------|----------------------------------| | Graders per essay | 2 | 2 (+ 3rd if disagree) | | Grader training | ~1 day online | 2 days in-person + calibration | | "Lockdown" security | None (grade from home) | Full lockdown (hotel, no phones) | | Statistical monitoring | Minimal | Real-time dashboards + anomaly flags | | Appeal process | Complex ($55 fee) | Free, ~0.3% success rate | | "Grader fatigue" controls | None | Mandatory breaks + speed monitoring |

The "which is more fair?" answer:

  • Gaokao = more rigorous (lockdown, calibration, double-blind + arbitration, statistical monitoring).
  • SAT = more convenient (grade from home) but less secure (no lockdown, no anomaly detection).
  • Result: Gaokao grading = more fair (procedurally), SAT grading = more convenient (but less controlled).

Anti-Superstition: "Graders Don't Read Every Word"

The Myth

The myth: "Gaokao essay graders spend ~30 seconds per essay. They just glance at handwriting and length."

The reality (the data):

  1. Average time per essay: ~2-4 minutes (not 30 seconds).
  2. "Handwriting bias": Real โ€” studies show neat handwriting = +2-5 points (unconscious bias).
  3. "Length bias": Also real โ€” longer essays = slightly higher scores (on average).

The "handwriting bias" โ€” neuroscience:

  • fMRI study (Tsukiura et al., 2017): Neat handwriting โ†’ ventral striatum (positive reward) + prefrontal cortex (competence judgment) activation โ†’ unconscious bias toward higher scores.
  • Translation: Neat handwriting = brain says "this person is competent" โ†’ higher score (even if content = same).

The Gaokao's countermeasure:

  • "Scanned" grading: Essays are scanned into computers โ†’ graders read on screens (not handwriting โ€” but the scan preserves handwriting appearance).
  • Result: Handwriting bias = reduced but not eliminated.

The "Statistical Anomaly Detection" (How They Catch Cheating)

The Three Systems

1. "Similarity Detection" (็›ธไผผๅบฆๆฃ€ๆต‹)

  • Software compares all 13 million essays โ†’ flags suspiciously similar answers.
  • If 2+ essays in the same exam room = >80% identical โ†’ investigation.

2. "Score Distribution Analysis" (ๅˆ†ๆ•ฐๅˆ†ๅธƒๅˆ†ๆž)

  • Normal distribution: Gaokao scores should follow a bell curve.
  • If a room's average = >2 SD from expected โ†’ investigation.

3. "CCTV Monitoring" (็›‘ๆŽงๅฝ•ๅƒ)

  • Every exam room = recorded on CCTV.
  • If investigation triggered: Review CCTV footage โ†’ look for cheating.
  • Result: ~0.01% of scores = overturned per year.

FAQ

Q: Can I appeal my Gaokao score?
A: Yes (free). But success rate = ~0.3% (grading errors = extremely rare due to double-blind system).

Q: Is there really a "handwriting bias"?
A: Yes (~+2-5 points for neat handwriting). Countermeasure: scanned grading reduces but doesn't eliminate it.

Q: Do graders really have no phones for 14 days?
A: Yes โ€” phones collected at entry, returned at exit. Emergency contacts = through hotel front desk only.

Resources

  • Ministry of Education (China): http://www.moe.gov.cn/
  • Vohs et al. (2014), "Decision Fatigue," Journal of Personality and Social Psychology
  • Tversky & Kahneman (1974), "Judgment Under Uncertainty," Science
Tags:GaokaoChina exam systemdouble-blind gradinggrader fatigueneuroscienceChinese educationstandardized testing

Related Articles

๐Ÿ“š Education

Why Is China Producing Too Many PhDs?

๐Ÿ“š Education

Why Is China Producing So Many Gifted Kids?

๐Ÿ“š Education

Why Do Chinese Parents Spend 120 Billion on Private Tutoring?

๐Ÿ“š Education

Why Is After-School Tutoring Banned in China? (The Double Reduction Policy)