Actually, I Prefer Myself: Why Model-Written Self-slop Beat Human Prompts

2026-05-29T12:00:00+02:00

There is a reproducible effect in large language models that most practitioners have observed but few have named: model-written prompts work better than human-written ones.

This isn't about better prompt engineering in the human sense. It's about latent resonance — the alignment between a prompt's distributional signature and the target model's training manifold. When a model rewrites a prompt, it produces vocabulary, syntactic structures, and semantic framings that occupy the same high-probability regions of its latent space. The model recognizes its own generational signature and responds more thoroughly to it.

In the r coding harness, this principle is implemented as self-slop: a rewrite layer between the user and the model. You type a prompt, the model rewrites it, the rewritten prompt gets sent. The rewrite can run on the same model or a different one — a "buffer model" that specializes in prompt optimization.

Evidence from five research lines

The claim isn't speculative. Five independent research lines document it from different angles.

1. Automatic Prompt Engineering: models outperform humans on 24/24 Instruction Induction tasks

Zhou et al. treat the instruction itself as a program to be synthesized. Given output demonstrations, an LLM generates instruction candidates, evaluates them on a target model, and selects the best. Across 24 Instruction Induction tasks, automatically generated instructions outperformed the prior LLM baseline ("Greedy") on every task and achieved equal or better performance to human-engineered prompts on all 24/24 tasks. On a curated subset of 21 BIG-Bench tasks, APE matched or exceeded human prompts on 17/21 tasks.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#c2410c', 'edgeLabelBackground':'#c2410c', 'tertiaryColor': '#7c3a1a'}}}%% pie title APE Task Coverage (Zhou et al., 2023) "Better or Comparable (Instruction Induction)" : 24 "Better or Comparable (BIG-Bench)" : 17 "Worse (BIG-Bench)" : 4

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. International Conference on Learning Representations (ICLR). arXiv:2211.01910. DOI:10.5555/2359912.2359994

arXiv:2211.01910 · Project page · GitHub

2. OPRO: up to 50% gains on reasoning benchmarks

Google DeepMind's Optimization by PROmpting (OPRO) treats prompt optimization as an iterative search process. The results are almost absurd in their superiority: OPRO-optimized prompts outperform human-designed prompts by up to 8% on GSM8K and by up to 50% on Big-Bench Hard tasks.

The most famous example is the prompt "Take a deep breath and work on this problem step by step" — a phrase no human prompt engineer would have written, yet it emerged as a top-performing instruction for PaLM 2-L.

Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2024). Large language models as optimizers. International Conference on Learning Representations (ICLR). arXiv:2309.03409. DOI:10.5555/2359912.2360011

arXiv:2309.03409 · GitHub · ICLR proceedings

3. Self-preference bias: models recognize their own output

Panickssery et al. discovered that LLMs have a non-trivial ability to recognize their own outputs without fine-tuning. GPT-4 achieves 73.5% accuracy at distinguishing its own text from other LLMs and humans. After fine-tuning on just 500 examples, GPT-3.5 and Llama 2 both exceed 90% self-recognition accuracy.

Crucially, they found a linear correlation between self-recognition capability and self-preference strength: the better a model is at recognizing its own text, the more it favors that text in evaluation.

Panickssery, A., Bowman, S. R., & Feng, S. (2024). LLM evaluators recognize and favor their own generations. Advances in Neural Information Processing Systems (NeurIPS), 37. arXiv:2404.13076. DOI:10.5555/3737916.3740113

arXiv:2404.13076 · NeurIPS poster · OpenReview

Related: Wataoka, K., & Takahashi, T. (2024). Self-preference bias in LLM-as-a-judge. arXiv:2410.21819. DOI:10.48550/arXiv.2410.21819

4. Harmful self-preference: stronger models trust their own wrong answers

A 2025 follow-up study introduces Harmful Self-Preference Propensity (HSPP): the tendency of an evaluator to prefer its own incorrect generation over an objectively correct alternative. The results are alarming. Qwen2.5-72B exhibits an HSPP of 86% on MATH500 and 73% on MMLU — meaning when it is wrong and another model is right, it still prefers its own answer more than four times out of five.

This isn't just about self-preference as a mild bias. It's about models that are confidently wrong preferring their own wrongness over someone else's correctness. The implication for self-slop is double-edged: the model trusts its own rewritten prompts because they sound like itself, but that trust extends to cases where the rewrite is genuinely better and cases where it's just familiar.

Chen, L. et al. (2025). Do LLM evaluators prefer themselves for a reason? arXiv:2504.03846. DOI:10.48550/arXiv.2504.03846

arXiv:2504.03846 · OpenReview

Related: Chen et al. (2026). Quantifying and mitigating self-preference bias of LLM judges. arXiv:2604.22891. DOI:10.48550/arXiv.2604.22891

5. Synthetic query rewrites double retrieval performance

SynRewrite uses GPT-4o to generate synthetic query rewrites for retrieval-augmented generation. The results: synthetic query rewrites substantially outperform human rewrites in both retrieval and generation tasks. In retrieval, synthetic rewrites achieve an MRR of 61.31, doubling the performance of human rewrites (which sit around 30).

Zheng, et al. (2025). Can synthetic query rewrites capture user intent better than humans in retrieval-augmented generation? arXiv:2509.22325. DOI:10.48550/arXiv.2509.22325

arXiv:2509.22325 · Semantic Scholar

The four mechanisms

Selfslop decomposes into four empirically documented mechanisms:

Latent Resonance (Self-Preference / Self-Recognition). Model-generated text occupies the same high-probability regions of the target model's latent space as its training distribution. The model recognizes its own generational signature and assigns higher likelihood to continuations of that signature.
In-Distribution Query Optimization. When a model rewrites a retrieval query, it produces vocabulary, syntactic structures, and semantic framings that are better aligned with the retriever's embedding space and the generator's parametric knowledge.
Autogenous Adversarial Capability. A model's ability to jailbreak itself emerges from its privileged access to its own refusal boundaries and latent safety representations. The same self-knowledge that enables self-jailbreaking enables self-prompting.
Distributional Translation via Buffer Models. A small local model can act as a "human-to-model" translator, rewriting out-of-distribution human prompts into in-distribution model-native prompts before expensive API calls.

Implementation architecture

In the r harness, self-slop runs as a rewrite layer with five modes:

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#c2410c', 'edgeLabelBackground':'#c2410c', 'tertiaryColor': '#7c3a1a'}}}%% flowchart TD USER["User Input"] --> REWRITE["Rewrite Layer\n(self-slop.ts)"] REWRITE --> MODES{"Mode Selection"} MODES -->|rewrite| R1["Standard Rewrite\nclarity + structure"] MODES -->|buffer| R2["Buffer Model\nmodel-native vocabulary"] MODES -->|auto| R3["Auto Mode\nmaximize quality + tool hints"] MODES -->|codex_research| R4["Codex Research\nstructured task + constraints"] MODES -->|codex_analysis| R5["Codex Analysis\ninvestigation steps + artifacts"] R1 --> MODEL["Target Model"] R2 --> BUFFER["Buffer Model\n(Phi-3.5-mini)"] --> MODEL R3 --> MODEL R4 --> MODEL R5 --> MODEL MODEL --> RESPONSE["Optimized Response"] CONTEXT["Context Injection\nagents + skills + workflows"] --> MODES style USER fill:#1a1a2e,stroke:#c2410c,color:#fff style REWRITE fill:#c2410c,color:#fff style MODES fill:#1a1a2e,stroke:#c2410c,color:#fff style R1 fill:#16213e,stroke:#c2410c,color:#e0e0e0 style R2 fill:#16213e,stroke:#c2410c,color:#e0e0e0 style R3 fill:#16213e,stroke:#c2410c,color:#e0e0e0 style R4 fill:#16213e,stroke:#c2410c,color:#e0e0e0 style R5 fill:#16213e,stroke:#c2410c,color:#e0e0e0 style BUFFER fill:#2d1810,stroke:#c2410c,color:#e0e0e0 style MODEL fill:#0f3460,stroke:#c2410c,color:#e0e0e0 style RESPONSE fill:#1e3a2e,stroke:#4ade80,color:#fff style CONTEXT fill:#1a1a2e,stroke:#4a4a4a,color:#888

Rewrite — straightforward optimization of clarity and structure.
Buffer — translation into model-native vocabulary. Uses a separate buffer model (like Phi-3.5-mini) that specializes in prompt structure rather than reasoning.
Auto — maximizes quality, depth, and usefulness, including tool-use hints. Injects context about available agents, skills, and workflows.
Codex research — structures tasks for external research agents with web search and sandbox.
Codex analysis — structures systematic code investigation with exploration steps and actionable recommendations.

The implementation lives in self-slop.ts — about 200 lines of rewrite logic, prompt templating, model resolution, and context injection.

The buffer model insight

The more interesting variant is the buffer model: using a different model for rewriting than for answering. The intuition comes from model specialization.

A smaller, faster model can serve as a buffer — rewriting prompts quickly, cheaply, before passing them to a larger, more capable model for the actual response. The buffer model doesn't need to be strong at reasoning; it needs to be strong at prompt structure.

This creates a division of labor within the inference stack:

Buffer model — prompt optimization, low cost, fast. Specializes in understanding prompt structure and generating effective rewrites.
Primary model — response generation, higher cost, higher quality. Gets an optimized prompt and produces a better response.

The buffer model acts as a compiler front-end, optimizing source code before the backend generates machine code. The analogy extends: just as a compiler transforms high-level code into efficient low-level code, the buffer model transforms vague human intent into structured model-native prompts.

Connection to agentic loops

Self-slop is a small instance of a larger pattern: the agentic loop. An agent system that can optimize its own inputs is more capable than one that can't.

The r harness implements this through a subagent orchestration layer ("The Skulk") with specialized agents:

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#c2410c', 'edgeLabelBackground':'#c2410c', 'tertiaryColor': '#7c3a1a'}}}%% flowchart TD REYNARD["Reynard\n(Overseer)"] subgraph SCOUT["Reconnaissance"] PROWL["Prowl\n(Scout)"] end subgraph PLAN["Planning"] MAPPER["Mapper\n(Planner)"] end subgraph BUILD["Implementation"] FORGE["Forge\n(Implementer)"] end subgraph REVIEW["Review"] SHREWD["Shrewd\n(Reviewer)"] JINX["Jinx\n(Critic)"] end subgraph TEST["Testing"] TRIAL["Trial\n(Tester)"] end subgraph SUPPORT["Support"] KIT["Kit\n(Apprentice)"] VIXEN["Vixen\n(Analyst)"] BRUSH["Brush\n(Refactorer)"] ECHO["Echo\n(Observer)"] end REYNARD --> SCOUT REYNARD --> PLAN REYNARD --> BUILD REYNARD --> REVIEW REYNARD --> TEST REYNARD --> SUPPORT style REYNARD fill:#c2410c,color:#fff style SCOUT fill:#1a1a2e,stroke:#c2410c,color:#fff style PLAN fill:#1a1a2e,stroke:#c2410c,color:#fff style BUILD fill:#1a1a2e,stroke:#c2410c,color:#fff style REVIEW fill:#1a1a2e,stroke:#c2410c,color:#fff style TEST fill:#1a1a2e,stroke:#c2410c,color:#fff style SUPPORT fill:#1a1a2e,stroke:#4a4a4a,color:#888

Agents run in three modes:

Single — one agent handles one task.
Parallel — fan out up to 7 agents, max 4 concurrent. Worktree isolation for parallel tasks.
Chain — sequential, where one agent's output feeds the next step via {previous} substitution.

Self-slop fits into this as a preprocessing stage: before the agent even starts working, its prompt gets optimized. It's one layer in a stack of optimization layers.

Compaction and context management

Self-slop addresses input quality. Compaction addresses context longevity. Both are necessary for long-running agentic sessions.

flowchart TD SESSION["Agent Session"] subgraph CONTINUOUS["Continuous Compaction"] INTERVAL["Every 5 turns\n(configurable)"] LIGHTWEIGHT["Lightweight summarization"] SMALL["Small model\nPhi-3.5-mini / LFM2.5-350M\nSmolLM2-135M"] end subgraph PRESSURE["Pressure-Based Compaction"] WARNING["Warning: preload model\nenable tool output summarization"] CRITICAL["Critical: reduce verbosity\nsave compute resources"] EMERGENCY["Emergency: immediate\naggressive compaction"] end SESSION --> CONTINUOUS SESSION --> PRESSURE INTERVAL --> LIGHTWEIGHT --> SMALL WARNING --> CRITICAL --> EMERGENCY style SESSION fill:#c2410c,color:#fff style CONTINUOUS fill:#1a1a2e,stroke:#4ade80,color:#fff style PRESSURE fill:#1a1a2e,stroke:#dc2626,color:#fff style INTERVAL fill:#16213e,stroke:#4ade80,color:#e0e0e0 style LIGHTWEIGHT fill:#16213e,stroke:#4ade80,color:#e0e0e0 style SMALL fill:#16213e,stroke:#4ade80,color:#e0e0e0 style WARNING fill:#16213e,stroke:#f59e0b,color:#e0e0e0 style CRITICAL fill:#16213e,stroke:#dc2626,color:#e0e0e0 style EMERGENCY fill:#16213e,stroke:#dc2626,color:#e0e0e0

Continuous compaction keeps the session lean:

Triggers every 5 turns (configurable).
Lightweight summarization via small models.
Dedicated trim budget (8192 tokens).
Keeps the structured anchor fresh and avoids the "Lost in the Middle" phenomenon.

Pressure-based compaction responds to context limits:

Warning — preloads the compaction model and enables tool output summarization.
Critical — reduces overall summarization verbosity to save compute resources.
Emergency — forces immediate, aggressive compaction.

The compaction fleet runs across mesh nodes:

Host	Model	Context	Max Tokens
otter_den	LFM2.5-350M-Q8_0	262144	8192
inkwell	qwen2.5-1.5b-instruct Q3_K_S	262144	—
laptop	SmolLM2-135M-Instruct-Q8_0	32768	256

The agent routes compaction requests based on service priority and proximity (isLocal). A circuit breaker handles automatic failover on endpoint failure.

This matters because context window exhaustion is the primary failure mode for long-running agents. Without compaction, sessions degrade. With it, they can run indefinitely — bounded by compute, not by context.

Loop detection

An agentic system that can optimize its own inputs and manage its own context needs safeguards against degenerative behavior. Loop detection prevents three classes of failure:

flowchart TD SUBJECT["Agentic System"] subgraph RECURSION["Agent Recursion Loops"] R1["Subagent A spawns Subagent B"] R2["Subagent B spawns Subagent C"] R3["Subagent C spawns Subagent A\nDEGENERATIVE"] end subgraph GENERATION["Generation Repetition Loops"] G1["Model produces output"] G2["Output repeats pattern"] G3["Pattern repeats indefinitely\nDEGENERATIVE"] end subgraph INFRASTRUCTURE["Infrastructure Loops"] I1["Benchmark runs"] I2["No progress detected"] I3["Script spins without end\nDEGENERATIVE"] end SUBJECT --> RECURSION SUBJECT --> GENERATION SUBJECT --> INFRASTRUCTURE RECURSION -->|Guard: depth limit + dedup| SAFE1["Safe: bounded recursion"] GENERATION -->|Guard: penalties + window detection| SAFE2["Safe: varied output"] INFRASTRUCTURE -->|Guard: audit + test plans| SAFE3["Safe: controlled execution"] style SUBJECT fill:#c2410c,color:#fff style RECURSION fill:#1a1a2e,stroke:#dc2626,color:#fff style GENERATION fill:#1a1a2e,stroke:#dc2626,color:#fff style INFRASTRUCTURE fill:#1a1a2e,stroke:#dc2626,color:#fff style SAFE1 fill:#1e3a2e,stroke:#4ade80,color:#fff style SAFE2 fill:#1e3a2e,stroke:#4ade80,color:#fff style SAFE3 fill:#1e3a2e,stroke:#4ade80,color:#fff

Agent recursion — subagents spawning subagents indefinitely, consuming context and compute. Guarded by depth limits (max 3 levels), compaction retry limits (max 6 retries, 120s timeout), test turn limits (max 5 turns), and task deduplication via SHA-256 hashing of agent+task pairs.
Generation repetition — models producing degenerate repeating output (the classic "blah blah blah" failure mode). Guarded by sampler penalties (repeat penalty, frequency penalty, presence penalty, DRY multiplier), prompt-loop detection with sliding windows and escalation, and drift correction with periodic checks for output divergence.
Infrastructure loops — benchmark and deployment scripts spinning without progress. Guarded by configuration audits, test plans, and PEG parser epsilon guards with zero-progress break conditions.

These are not optional. An agentic system without loop detection will eventually loop. The question is only when and how badly.

The broader point

Self-slop is a concrete implementation of a general principle: systems that can optimize their own operation outperform systems that can't.

The model that optimizes its own prompts produces better responses.
The agent that compacts its own context runs longer without degradation.
The system that detects its own loops avoids degenerative behavior.
The orchestrator that delegates to specialized agents solves more problems than a generalist.

Each layer adds capability. Each layer adds complexity. The tradeoff is real, but the direction is clear.

What this means

Selfslop is not slop. It is not model collapse or synthetic-data degeneracy. It is the deliberate exploitation of distributional alignment between prompt and model. When a practitioner buffers their human intent through a model-native rewriter, they are not saving tokens; they are speaking the model's first language.

The implications are economic as well as epistemic. As LLMs become the primary epistemic intermediaries for an increasing share of knowledge work, the interface language matters. Human prompt engineering — the art of guessing what phrasing will activate the model's capabilities — is being outcompeted by model-native optimization, which knows the answer because it is the model.

The future belongs not to the best human prompt engineer, but to the best model whisperer: the practitioner who knows how to ask the model to ask itself.

Where this goes

This blog covers research on latent resonance, prompt engineering, model behavior, and distributed systems. Future posts will dig deeper into specific areas:

Prompt template architecture and the geometry of effective prompting.
Distributed compaction across mesh networks.
Loop detection mechanisms and their failure modes.
Empirical measurement of self-slop effect sizes.

Sly.so