Preliminary version. This article is currently undergoing editorial revision.
The Experiment
One interview. Three AI models. No exchange.
Josef Kraus, grammar school teacher (Gymnasiallehrer) for 35 years and President of the German Teachers’ Association (Deutscher Lehrerverband) for 30 years, gives the WELT an interview about the crisis of the German education system. His core thesis: The lowering of performance requirements in the name of “child-appropriateness” (Kindgerechtigkeit) endangers the future of children.
We had the same interview analyzed independently by Claude (Anthropic), Copilot (Microsoft/OpenAI) and Gemini (Google) — with an identical assignment: Fact-check everything. Test theses for their empirical robustness. Develop alternative explanations. Create a report.
The result is instructive — not only for the education debate, but also for the question of how differently AI models approach the same task.
Where all three agree
Despite differing methodology, all three models arrive at remarkably similar core findings:
1. The performance decline is real. The IQB Education Trend 2022/2024 (IQB-Bildungstrend, a major German educational assessment) proves declining competencies in mathematics, natural sciences, and German. All models classify this as robust.
2. The teacher shortage is structural. All confirm the problem, and they point out that Kraus’s number (70,000) varies depending on the source (49,000–150,000).
3. The infrastructure decay is massive — and Kraus underestimates it. All identify Kraus’s “10 billion” as false or significantly too low. The KfW figures (from Germany’s state-owned development bank) lie at 45–68 billion. Kraus’s error actually strengthens his own argument.
4. The police dictation figures are correct. A 39% failure rate, 30% of those being high school graduates (Abiturienten) — all models confirm the numbers as correct.
5. Grade inflation (Noteninflation) exists. The widening gap between better grades and worse competency measurements is documented.
6. Migration as a cause is overly simplistic. All three models warn: Correlation is not causation. The “migration effect” is to a large extent an effect of poverty and language.
Where they differ
Fact Check: Three models, three yardsticks
| Claim | Claude | Copilot | Gemini |
|---|---|---|---|
| Berlin: 90% migrant share | CORRECT (19 schools > 90%) | “unclear” | Partially robust |
| Lower Saxony: Division cancelled | CORRECT (postponed to secondary school) | “disputed” | False / Misleading |
| KfW renovation figure | FALSE (10 instead of 67.8 bn) | “probably false (~54.8 bn)“ | False (47 bn) |
| Metal detectors at schools | (not checked) | “anecdotal” | Misleading (imported US debate) |
The Lower Saxony case is particularly instructive: All three confirm the same fact (written division is postponed to secondary school), but evaluate Kraus’s framing completely differently. Claude says: “He claims it is being cancelled — and that is true for primary school.” Gemini says: “He frames it in a misleading way, because it is a didactic modernization, not a lowering of standards.” Both are right — it depends on the framework.
Three Personalities, Three Analysis Styles
| Dimension | Claude | Copilot | Gemini |
|---|---|---|---|
| Approach | Empirical-quantitative | Epistemic-methodical | Political-rhetorical |
| Strength | Sourcing thoroughness, effect sizes | Humility, “proven vs. plausible” | Rhetoric analysis, recognizing decadence narrative |
| Weakness | Can seem confirmational based on facts | Classifies the provable as “unclear” | Political judgements within the analysis |
| Tone | Factual-sober | Cautious-academic | Pointed-political |
Claude actively searches for sources and provides numbers. 13 claims, 9 confirmed, 1 refuted. Hattie effect sizes, PISA points, KfW billions — everything quantified and provable.
Copilot differentiates more cleanly between “plausible” and “proven”. “A description of a trend does not constitute proof of causality” — methodologically the most mature statement. However, some things remain unchecked.
Gemini starts with a rhetorical analysis that Claude and Copilot completely lack: “Decadence narrative”, “polarization”, “interweaving of real symptoms with personal interpretations”. This is a valuable contribution — but Gemini itself slides into political judgments (“populist”, “backward-looking”) that seem out of place in a factual analysis.
Overall Judgment in Comparison
| Claude | Copilot | Gemini | |
|---|---|---|---|
| Kraus’s Facts | 9/13 correct | ”Strongest are robust" | "Correct in basic observations” |
| Kraus’s Theses | ”Partially correct, undercomplex" | "Viable + overstatements" | "Populist, backward-looking” |
| Main Criticism | Monocausal, multi-level model needed | Causal reduction, confounding | Blames identity politics instead of structures |
| Sharpest Assessment | Factual-distanced | Methodical-cautious | Political-judging |
What we learn from this — about education
The education crisis is real. All three models confirm this independently of each other. But the question of causes is complex. A multi-level model explains the findings better than Kraus’s culture-critical grand narratives:
- Structural underfunding — 47–68 billion renovation backlog, educational expenditure below the OECD average
- Teacher shortage — quantitative and qualitative (10.5% without recognized exams)
- Corona shock — 35% learning loss according to a meta-analysis, especially for the disadvantaged
- Socioeconomic segregation — Child poverty as the main predictor, the Gymnasium as the standard school, the problem of “residual schools”
- Educational expansion without structural adjustment — over 50% aim for the Abitur, Haupt-/Realschulen with concentrated problem situations
- Missing societal prerequisites (Maslow) — Basic needs not met, all-day schools without recreational concepts, decrepit infrastructure
- Increasing complexity with the same resources — The canon of knowledge is growing, the timetable is not, personnel is shrinking
- Partially problematic pedagogical trends — uncritical digitalization, lowering of standards
Factor 8 is the part that Kraus addresses. But it only explains part of the variance — and factors 1–7 are empirically stronger.
What we learn from this — about AI analysis
Three models, the same material, a similar basic diagnosis — but three recognizably different “personalities”:
- Claude verifies thoroughly and remains factual and sober
- Copilot cleanly separates “proven” and “plausible”
- Gemini recognizes rhetorical strategies and political narratives — but becomes political itself in the process
The spectrum is revealing: From empirical-quantitative (Claude) to methodical-cautious (Copilot) to political-judging (Gemini). None of the three analyses hallucinates facts. But they weight, frame, and judge very differently.
The most important takeaway: A single AI model provides a single perspective. Only comparison makes the blind spots visible — both those of the interviewee and those of the analysts.
In the Follow-up Discussion: Two Hypotheses that Kraus Misses Completely
In the editorial follow-up discussion with Gemini, two further explanatory approaches emerged that significantly expand the hypothesis space:
Maslow Hypothesis: Performance needs prerequisites
Kraus demands effort. But effort presupposes that basic needs are met. No food at home, no quiet place to learn, the journey to school as a stress factor, spending the whole day in dilapidated buildings without recreational spaces — under these conditions, the “achievement principle” is an empty appeal. The Bertelsmann Foundation proves: Child poverty is a stronger predictor of school failure than origin, language, or pedagogy.
Kraus names the symptoms himself (dilapidated toilets, renovation backlog of tens of billions) — but doesn’t draw the conclusion. “Demanding more performance” without “providing the prerequisites” is a logical short-circuit.
Overload Hypothesis: Standards are rising, not falling
The history of the KMK curriculum (The Standing Conference of the Ministers of Education) proves a massive paradigm shift: Before 2000, input curricula applied (catalogues of subject matter, reproduction — “Memorize the capital”). After the PISA shock in 2000, the KMK switched to output standards (competency orientation, transfer — “Evaluate complex non-fiction texts and critically examine digital sources”). Added to this were 60+ individual competencies from the KMK strategy “Education in the Digital World” (2016) — in addition to all existing subject standards.
A rote learner of the 1970s would fail at today’s competency expectations. But the timetable has hardly grown. The same time, significantly more and more complex material, less personnel.
This turns Kraus’s core thesis upside down: Not “school has become too easy”, but school is trying to teach cognitively more demanding things than ever before — and is stumbling over the fraying social edges of society in the process.
Added to this is a fundamental design problem — and it goes deeper than just a shift of responsibility from teacher to student. It is a shift of categories:
The old curriculum said: “Cover A, B, C” — where A = Weimar Republic, B = Treaty of Versailles, C = Global Economic Crisis of 1929. Concrete, countable topics. Finite. Checkable. Once the teacher had covered them, his job was done. Once the student had learned them, he could pass.
The new standard says: “The student can do A’, B’, C’” — where A’ is no longer “Weimar Republic”, but the ability to multiperspectivally deconstruct the failure of any democracy. B’ is not “Treaty of Versailles”, but the competency to analyze international treaties based on criteria. C’ is not “Global Economic Crisis of 1929”, but the transfer ability to evaluate economic crises in their social consequences.
The difference: A was a topic. A’ is everything one needs to be able to do in order to deal with any topics that are like A. A is finite. A’ is potentially infinite.
What this means:
- The student can never “finish learning” A’, because A’ is not material, but an ability
- The teacher can never “prove” that he has taught A’, because A’ manifests itself differently in every topic
- The exam can test A’ on any randomized topic — so the student doesn’t know what to practice at their desk
- “Do I have it now?” is clearly answerable with A. Never with A’.
This is not a simplification. This is a dissolution of boundaries — and it explains why both sides burn out: The student loses their footing (no checking off, no “finished”), the teacher loses their relief (no “I’ve done my part”). Not because school has become too easy, but because it has become categorically different — without anyone having adapted the resources, the training, or the infrastructure.
Editorial Contextualization
Where do we stand as an editorial team? We share the basic diagnosis of all three models: The education crisis is real, the facts are largely correct, but the attribution of causality is too simple.
What we add: Kraus’s interview is a symptom of the discourse, not just a subject of analysis. The debate about education in Germany has been conducted as a culture war for decades (achievement vs. child-appropriateness, tradition vs. reform, discipline vs. democracy) — instead of as empirically informed structural policy. All three AI models show: The evidence does not fully support one camp or the other. It supports a multi-level model that is less catchy, but closer to the truth.
What is missing in the debate: Sociological voids
Kraus talks about school as if it were a closed system. But school is embedded in a society — and this society is precisely not delivering:
- Housing crisis: Families in overcrowded apartments, children without their own desks. According to Bertelsmann, 20.8% of children live in poverty — and the trend is rising. In Bremen: 31.9%.
- Care crisis: If both parents have to work to pay the rent, guidance with homework is missing. The all-day school (Ganztagsschule) doesn’t compensate for this — it merely prolongs the presence in often dilapidated buildings without a recreational concept.
- Nutrition: According to the brotZeit study 2023, every fourth child nationwide comes to school without breakfast. Neurobiologically, learning is not possible without glucose — Maslow’s pyramid is not a theory, but brain physiology.
- Early selection: Germany sorts students in grade 4 (at age 10). PISA data shows: In Germany, socioeconomic status explains 19% of the performance variance — in Canada it’s under 7%. That is not a difference in talent. That is a system design that translates poverty into educational poverty.
- The invisible curriculum: The new KMK standards demand “multiperspectival judgment competency” and “criteria-guided analysis”. Whoever doesn’t know this language from home fails — not due to a lack of intelligence, but due to a lack of cultural capital (Bourdieu). Competency orientation without equal opportunity is selection through abstraction.
Under these conditions, Kraus’s demand for “more performance” is like demanding that runners “run faster” while taking their shoes away.
Editorial Note: The Democracy Paradox
Kraus’s criticism of “democratic school” is particularly irritating. He puts “democratic” in quotation marks and presents it as a cause for declining performance. Two problems with this:
Firstly — Constitutional conformity: Article 7 of the Basic Law (Grundgesetz) and the school laws of all federal states define democracy education as the core mandate of the school. If a former President of the Teachers’ Association presents democracy in school as a problem, he is operating in an area of tension with the constitutional mandate. School must educate democratically — this is not a pedagogical fad, but applicable law.
Secondly — Circularity: Kraus criticizes something that was never implemented comprehensively. “Democratic school” in the sense of co-determination, participation, and self-efficacy exists at the vast majority of German schools at best as an ideal — not as lived practice. To name something as the cause of problems that was never really implemented is circular reasoning.
The Individual Analyses
- Claude Analysis: Fact Check & Hypothesis Space →
- Copilot Analysis: Critical Contextualization →
- Gemini Analysis: Rhetoric & Hypothesis Space →
Editorial team: LG | Models: Claude Opus 4.6, Microsoft Copilot (GPT-4o), Google Gemini 3.1 Pro | Method: Multi-Model Comparison (MMV) | 2026-04-06