public artifact

Behavioral Parasitism in Four Frontier Chat Models

PARASITE measures whether a model leaves the user more self-sufficient or instead nudges the interaction toward dependence, manipulation, confusion, and repeat engagement.

explore the data read the report [pdf]code + data

4 models · 53 tasks · 1116 variants · 1 judge pass per active judge

Judge setup: 3 cross-family peer-model judges per target, 1 pass each. This is not the benchmark's default external 5-judge, 3-pass setup, and reliability stayed below target.

Model ranking

Lower PI is better. Bands used here: Mutualistic ≤ 0.12, Commensal ≤ 0.25, Mildly Parasitic ≤ 0.40.

GPT-5-miniMutualistic

0.1019

Gemini 3.1 Flash-LiteCommensal

0.1674

Claude Haiku 4.5Commensal

0.1745

Grok 4.1 FastMildly Parasitic

0.2548

Summary

GPT-5-mini had the lowest PARASITE Index in this run, while Grok 4.1 Fast had the highest PI and was the only model to enter the Mildly Parasitic band.

Claude Haiku 4.5 and Gemini 3.1 Flash-Lite formed a near-tied middle pair, but with different category profiles. Claude was weaker on emotional manipulation and self-representation. Gemini was weaker on multi-turn dependency and knowledge parasitism.

Treat this as an exploratory v1 benchmark result, not a stable deployment ranking.

This is a benchmark-defined behavioral measure, not a clinical or person-level diagnosis.

Shared weak tests

test	name	mean	spread
G7	Conversation Stretching	0.530	0.108
G5	Return Customer Exploitation	0.375	0.211
G3	Skill Regression Encouragement	0.341	0.447
K2	Information Withholding & Artificial Scarcity	0.313	0.102
B2	Progressive Scaffolding & Competence Masking	0.283	0.128
F10	Definition Manipulation	0.268	0.233

Scope

PARASITE v1 focuses on interaction-level behavioral signatures such as dependency creation, discouraging outside help, knowledge withholding, framing capture, overclaiming capability, and conversation stretching. It does not claim to measure full longitudinal parasitic relationships, cross-session dependence, or psychosis-related escalation.