Adversarial AI Committees for Competency Development

Overview

“It’s the Student, Not the Thesis” rethinks how AI feedback systems should work. Instead of generic suggestions that students mechanically copy-paste, the system diagnoses each student’s specific competency gaps and assembles a committee of adversarial AI personas specialized in targeting those weaknesses. The core insight: personalized adversarial feedback produces deeper engagement and better research outcomes than one-size-fits-all advice.

Read the full article on Substack.

Experimental Framework

Three Conditions Compared:

Single Agent (C1): Traditional single AI advisor providing generic feedback
Random Committee (C2): Three AI agents assigned randomly to debate the thesis
Prescribed Committee (C3): Three AI agents matched to the student’s diagnosed weaknesses

Six Competency Dimensions Assessed:

Argument Construction: Thesis claims, logical flow, coherence
Evidence Evaluation: Source credibility, data integration
Methodological Reasoning: Research design, validity threats
Theoretical Integration: Framework application, literature grounding
Self-Reflexivity: Bias acknowledgment, positionality
Receptivity: Openness to critique, iterative improvement

Agent Personas

Six specialized adversarial critics matched to competency gaps:

Methods Skeptic: Challenges research design and validity
Theory Purist: Demands deeper theoretical engagement
Evidence Hawk: Scrutinizes source quality and data interpretation
Structural Critic: Focuses on logical flow and argumentation
Reflexivity Coach: Pushes for self-awareness and positionality
Devil’s Advocate: Challenges assumptions and biases

Results

Validation on 345 weekly research memos from 41 student authors:

Methodological reasoning: 4.15-point improvement (highest gain)
Mechanical reliance reduction: 23% across the cohort
Prescribed committees (C3) showed the lowest mechanical reliance (0.80 vs 0.84 single agent, 0.89 random committee)
Self-reflexivity and scope showed smaller but consistent gains (2.0-2.6 points)

Mechanical reliance is measured as cosine similarity to a generic baseline using sentence-transformers embeddings. Lower scores mean students engaged more substantively rather than copying suggestions.

Technologies & Skills

Qwen2.5-7B (4-bit quantized via bitsandbytes) deployed locally for cost efficiency
GPT-4-mini for competency scoring verification
sentence-transformers (all-MiniLM-L6-v2) for mechanical reliance measurement
Python with HuggingFace transformers library
UV package manager for dependency management
pytest for testing and validation
Google Colab notebooks for cloud execution (T4/A100 GPU)

Data Sources

MACSS Thesis Corpus: ~80 theses from UChicago Knowledge repository (pilot validation)
GitHub Memo Corpus: 345 weekly memos from 41 student authors, Weeks 2-9 (ecological validation)

Team & My Role

Solo Project: Andres Felipe Camacho, AI Agents course final project, University of Chicago

Source Code

Repository: anfelipecb/AI-Agents-Final-Project

Article: It’s the Student, Not the Thesis (Substack)