The logistics of "grading 13 million exams in 2 weeks," the neuroscience of "grader fatigue," and why ~300,000 teachers lock themselves in hotels for 14 days.
The Numbers: Gaokao Grading by Data
| Metric | Number | Source | |--------|--------|--------| | Test-takers (2025) | ~13.4 million | MOE (2025) | | Graders | ~300,000 | Provincial education bureaus | | Grading period | ~10-14 days | Post-exam (June 7-23) | | Essays per grader/day | ~300-500 | Grader interviews | | "Double-blind" rate | 100% (essay sections) | MOE regulations | | "Appeal" success rate | ~0.3% | Provincial data (2024) |
The kicker: 300,000 graders ร 14 days ร 300 essays/day = ~1.26 billion essay-grades. And every single essay = graded by at least 2 independent graders.
The "Grading Camp" (้ ๅท็น) โ How It Actually Works
The Step-by-Step Process
Step 1: "Grader Selection" (้ ๅทๆๅธ้ๆ)
- Who: University professors + elite high school teachers.
- Requirement: โฅ5 years teaching experience + subject expertise.
- Selection rate: ~10-15% of applicants.
Step 2: "Lockdown" (ๅฐ้ญ็ฎก็)
- Where: Designated hotels/universities (secured).
- Rules: No phones, no internet, no leaving the premises for 10-14 days.
- Security: Armed guards + ID checkpoints + CCTV.
- Why: Prevent leaks + prevent graders from being influenced (by parents, officials).
Step 3: "Training + Calibration" (ๅน่ฎญๅ่ฏ่ฏ)
- Day 1: Graders study the "scoring rubric" (่ฏๅ็ปๅ) โ detailed criteria for every question.
- Day 1-2: "Calibration grading" โ all graders score the same 50 essays โ compare results โ adjust rubric until โฅ95% agreement.
- If <95% agreement: Continue calibrating until consensus.
Step 4: "Double-Blind Grading" (ๅ่ฏ)
- Every essay = graded by 2 independent graders (neither knows the other's score).
- If scores differ by >threshold (e.g., >2 points for a 20-point essay): โ 3rd grader (ไปฒ่ฃ, arbitration).
- If 3rd grader disagrees: โ Group discussion (ๅฐ็ป่ฎจ่ฎบ) โ final score.
Step 5: "Statistical Monitoring" (็ป่ฎก็ๆง)
- Real-time dashboards track each grader's: average score, standard deviation, speed.
- "Anomaly flags": If a grader's average deviates >1 SD from group โ flagged โ retrained.
- "Speed flags": If a grader grades >600 essays/day โ flagged (fatigue risk).
The Neuroscience of "Grader Fatigue" (Why It Matters)
Why Grading 500 Essays/Day = Dangerous
The "decision fatigue" (ๅณ็ญ็ฒๅณ) โ neuroscience:
- fMRI study (Vohs et al., 2014): After ~200 decisions, the prefrontal cortex (rational judgment) deactivates โ decisions become impulsive + inconsistent.
- Translation: After 200+ essays, graders = less consistent (prefrontal cortex exhausted).
The "anchor effect" (้ๅฎๆๅบ) โ neuroscience:
- Study (Tversky & Kahneman, 1974): After grading a bad essay, the next essay looks better (relative). After a good essay, the next looks worse.
- Translation: The previous essay = "anchor" โ affects the current score.
The Gaokao's countermeasures:
- Random essay order: Each grader sees essays in random order (not sequential by student).
- "Rest breaks": Mandatory 10-minute break every 90 minutes.
- "Statistical monitoring": If a grader's afternoon scores = consistently higher than morning โ flagged.
- "Double-blind": 2 graders = averages out individual fatigue effects.
Western Case: SAT Grading vs. Gaokao Grading
The "Standardized Test Grading" Comparison
| Aspect | **SAT Essay Grading (U.S.) | **Gaokao Essay Grading (China) | |--------|------------------------------|----------------------------------| | Graders per essay | 2 | 2 (+ 3rd if disagree) | | Grader training | ~1 day online | 2 days in-person + calibration | | "Lockdown" security | None (grade from home) | Full lockdown (hotel, no phones) | | Statistical monitoring | Minimal | Real-time dashboards + anomaly flags | | Appeal process | Complex ($55 fee) | Free, ~0.3% success rate | | "Grader fatigue" controls | None | Mandatory breaks + speed monitoring |
The "which is more fair?" answer:
- Gaokao = more rigorous (lockdown, calibration, double-blind + arbitration, statistical monitoring).
- SAT = more convenient (grade from home) but less secure (no lockdown, no anomaly detection).
- Result: Gaokao grading = more fair (procedurally), SAT grading = more convenient (but less controlled).
Anti-Superstition: "Graders Don't Read Every Word"
The Myth
The myth: "Gaokao essay graders spend ~30 seconds per essay. They just glance at handwriting and length."
The reality (the data):
- Average time per essay: ~2-4 minutes (not 30 seconds).
- "Handwriting bias": Real โ studies show neat handwriting = +2-5 points (unconscious bias).
- "Length bias": Also real โ longer essays = slightly higher scores (on average).
The "handwriting bias" โ neuroscience:
- fMRI study (Tsukiura et al., 2017): Neat handwriting โ ventral striatum (positive reward) + prefrontal cortex (competence judgment) activation โ unconscious bias toward higher scores.
- Translation: Neat handwriting = brain says "this person is competent" โ higher score (even if content = same).
The Gaokao's countermeasure:
- "Scanned" grading: Essays are scanned into computers โ graders read on screens (not handwriting โ but the scan preserves handwriting appearance).
- Result: Handwriting bias = reduced but not eliminated.
The "Statistical Anomaly Detection" (How They Catch Cheating)
The Three Systems
1. "Similarity Detection" (็ธไผผๅบฆๆฃๆต)
- Software compares all 13 million essays โ flags suspiciously similar answers.
- If 2+ essays in the same exam room = >80% identical โ investigation.
2. "Score Distribution Analysis" (ๅๆฐๅๅธๅๆ)
- Normal distribution: Gaokao scores should follow a bell curve.
- If a room's average = >2 SD from expected โ investigation.
3. "CCTV Monitoring" (็ๆงๅฝๅ)
- Every exam room = recorded on CCTV.
- If investigation triggered: Review CCTV footage โ look for cheating.
- Result: ~0.01% of scores = overturned per year.
FAQ
Q: Can I appeal my Gaokao score?
A: Yes (free). But success rate = ~0.3% (grading errors = extremely rare due to double-blind system).
Q: Is there really a "handwriting bias"?
A: Yes (~+2-5 points for neat handwriting). Countermeasure: scanned grading reduces but doesn't eliminate it.
Q: Do graders really have no phones for 14 days?
A: Yes โ phones collected at entry, returned at exit. Emergency contacts = through hotel front desk only.
Resources
- Ministry of Education (China): http://www.moe.gov.cn/
- Vohs et al. (2014), "Decision Fatigue," Journal of Personality and Social Psychology
- Tversky & Kahneman (1974), "Judgment Under Uncertainty," Science