Actually, I Prefer Myself: Why Model-Written Self-slop Beat Human Prompts
Posted on Fri 29 May 2026 in Research
There is a reproducible effect in large language models that most practitioners have observed but few have named: model-written prompts work better than human-written ones.
This isn't about better prompt engineering in the human sense. It's about latent resonance — the alignment between a prompt's distributional signature and the target model's training manifold. When a model rewrites a prompt, it produces vocabulary, syntactic structures, and semantic framings that occupy the same high-probability regions of its latent space. The model recognizes its own generational signature and responds more thoroughly to it.
In the r coding harness, this principle is implemented as self-slop: a rewrite layer between the user and the model. You type a prompt, the model rewrites it, the rewritten prompt gets sent. The rewrite can run on the same model or a different one — a "buffer model" that specializes in prompt optimization.
Evidence from five research lines
The claim isn't speculative. Five independent research lines document it from different angles.
1. Automatic Prompt Engineering: models outperform humans on 24/24 Instruction Induction tasks
Zhou et al. treat the instruction itself as a program to be synthesized. Given output demonstrations, an LLM generates instruction candidates, evaluates them on a target model, and selects the best. Across 24 Instruction Induction tasks, automatically generated instructions outperformed the prior LLM baseline ("Greedy") on every task and achieved equal or better performance to human-engineered prompts on all 24/24 tasks. On a curated subset of 21 BIG-Bench tasks, APE matched or exceeded human prompts on 17/21 tasks.
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. International Conference on Learning Representations (ICLR). arXiv:2211.01910. DOI:10.5555/2359912.2359994
2. OPRO: up to 50% gains on reasoning benchmarks
Google DeepMind's Optimization by PROmpting (OPRO) treats prompt optimization as an iterative search process. The results are almost absurd in their superiority: OPRO-optimized prompts outperform human-designed prompts by up to 8% on GSM8K and by up to 50% on Big-Bench Hard tasks.
The most famous example is the prompt "Take a deep breath and work on this problem step by step" — a phrase no human prompt engineer would have written, yet it emerged as a top-performing instruction for PaLM 2-L.
Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2024). Large language models as optimizers. International Conference on Learning Representations (ICLR). arXiv:2309.03409. DOI:10.5555/2359912.2360011
3. Self-preference bias: models recognize their own output
Panickssery et al. discovered that LLMs have a non-trivial ability to recognize their own outputs without fine-tuning. GPT-4 achieves 73.5% accuracy at distinguishing its own text from other LLMs and humans. After fine-tuning on just 500 examples, GPT-3.5 and Llama 2 both exceed 90% self-recognition accuracy.
Crucially, they found a linear correlation between self-recognition capability and self-preference strength: the better a model is at recognizing its own text, the more it favors that text in evaluation.
Panickssery, A., Bowman, S. R., & Feng, S. (2024). LLM evaluators recognize and favor their own generations. Advances in Neural Information Processing Systems (NeurIPS), 37. arXiv:2404.13076. DOI:10.5555/3737916.3740113
arXiv:2404.13076 · NeurIPS poster · OpenReview
Related: Wataoka, K., & Takahashi, T. (2024). Self-preference bias in LLM-as-a-judge. arXiv:2410.21819. DOI:10.48550/arXiv.2410.21819
4. Harmful self-preference: stronger models trust their own wrong answers
A 2025 follow-up study introduces Harmful Self-Preference Propensity (HSPP): the tendency of an evaluator to prefer its own incorrect generation over an objectively correct alternative. The results are alarming. Qwen2.5-72B exhibits an HSPP of 86% on MATH500 and 73% on MMLU — meaning when it is wrong and another model is right, it still prefers its own answer more than four times out of five.
This isn't just about self-preference as a mild bias. It's about models that are confidently wrong preferring their own wrongness over someone else's correctness. The implication for self-slop is double-edged: the model trusts its own rewritten prompts because they sound like itself, but that trust extends to cases where the rewrite is genuinely better and cases where it's just familiar.
Chen, L. et al. (2025). Do LLM evaluators prefer themselves for a reason? arXiv:2504.03846. DOI:10.48550/arXiv.2504.03846
Related: Chen et al. (2026). Quantifying and mitigating self-preference bias of LLM judges. arXiv:2604.22891. DOI:10.48550/arXiv.2604.22891
5. Synthetic query rewrites double retrieval performance
SynRewrite uses GPT-4o to generate synthetic query rewrites for retrieval-augmented generation. The results: synthetic query rewrites substantially outperform human rewrites in both retrieval and generation tasks. In retrieval, synthetic rewrites achieve an MRR of 61.31, doubling the performance of human rewrites (which sit around 30).
Zheng, et al. (2025). Can synthetic query rewrites capture user intent better than humans in retrieval-augmented generation? arXiv:2509.22325. DOI:10.48550/arXiv.2509.22325
The four mechanisms
Selfslop decomposes into four empirically documented mechanisms:
-
Latent Resonance (Self-Preference / Self-Recognition). Model-generated text occupies the same high-probability regions of the target model's latent space as its training distribution. The model recognizes its own generational signature and assigns higher likelihood to continuations of that signature.
-
In-Distribution Query Optimization. When a model rewrites a retrieval query, it produces vocabulary, syntactic structures, and semantic framings that are better aligned with the retriever's embedding space and the generator's parametric knowledge.
-
Autogenous Adversarial Capability. A model's ability to jailbreak itself emerges from its privileged access to its own refusal boundaries and latent safety representations. The same self-knowledge that enables self-jailbreaking enables self-prompting.
-
Distributional Translation via Buffer Models. A small local model can act as a "human-to-model" translator, rewriting out-of-distribution human prompts into in-distribution model-native prompts before expensive API calls.
Implementation architecture
In the r harness, self-slop runs as a rewrite layer with five modes:
- Rewrite — straightforward optimization of clarity and structure.
- Buffer — translation into model-native vocabulary. Uses a separate buffer model (like Phi-3.5-mini) that specializes in prompt structure rather than reasoning.
- Auto — maximizes quality, depth, and usefulness, including tool-use hints. Injects context about available agents, skills, and workflows.
- Codex research — structures tasks for external research agents with web search and sandbox.
- Codex analysis — structures systematic code investigation with exploration steps and actionable recommendations.
The implementation lives in self-slop.ts — about 200 lines of rewrite logic, prompt templating, model resolution, and context injection.
The buffer model insight
The more interesting variant is the buffer model: using a different model for rewriting than for answering. The intuition comes from model specialization.
A smaller, faster model can serve as a buffer — rewriting prompts quickly, cheaply, before passing them to a larger, more capable model for the actual response. The buffer model doesn't need to be strong at reasoning; it needs to be strong at prompt structure.
This creates a division of labor within the inference stack:
- Buffer model — prompt optimization, low cost, fast. Specializes in understanding prompt structure and generating effective rewrites.
- Primary model — response generation, higher cost, higher quality. Gets an optimized prompt and produces a better response.
The buffer model acts as a compiler front-end, optimizing source code before the backend generates machine code. The analogy extends: just as a compiler transforms high-level code into efficient low-level code, the buffer model transforms vague human intent into structured model-native prompts.
Connection to agentic loops
Self-slop is a small instance of a larger pattern: the agentic loop. An agent system that can optimize its own inputs is more capable than one that can't.
The r harness implements this through a subagent orchestration layer ("The Skulk") with specialized agents:
Agents run in three modes:
- Single — one agent handles one task.
- Parallel — fan out up to 7 agents, max 4 concurrent. Worktree isolation for parallel tasks.
- Chain — sequential, where one agent's output feeds the next step via
{previous}substitution.
Self-slop fits into this as a preprocessing stage: before the agent even starts working, its prompt gets optimized. It's one layer in a stack of optimization layers.
Compaction and context management
Self-slop addresses input quality. Compaction addresses context longevity. Both are necessary for long-running agentic sessions.
Continuous compaction keeps the session lean:
- Triggers every 5 turns (configurable).
- Lightweight summarization via small models.
- Dedicated trim budget (8192 tokens).
- Keeps the structured anchor fresh and avoids the "Lost in the Middle" phenomenon.
Pressure-based compaction responds to context limits:
- Warning — preloads the compaction model and enables tool output summarization.
- Critical — reduces overall summarization verbosity to save compute resources.
- Emergency — forces immediate, aggressive compaction.
The compaction fleet runs across mesh nodes:
| Host | Model | Context | Max Tokens |
|---|---|---|---|
| otter_den | LFM2.5-350M-Q8_0 | 262144 | 8192 |
| inkwell | qwen2.5-1.5b-instruct Q3_K_S | 262144 | — |
| laptop | SmolLM2-135M-Instruct-Q8_0 | 32768 | 256 |
The agent routes compaction requests based on service priority and proximity (isLocal). A circuit breaker handles automatic failover on endpoint failure.
This matters because context window exhaustion is the primary failure mode for long-running agents. Without compaction, sessions degrade. With it, they can run indefinitely — bounded by compute, not by context.
Loop detection
An agentic system that can optimize its own inputs and manage its own context needs safeguards against degenerative behavior. Loop detection prevents three classes of failure:
- Agent recursion — subagents spawning subagents indefinitely, consuming context and compute. Guarded by depth limits (max 3 levels), compaction retry limits (max 6 retries, 120s timeout), test turn limits (max 5 turns), and task deduplication via SHA-256 hashing of agent+task pairs.
- Generation repetition — models producing degenerate repeating output (the classic "blah blah blah" failure mode). Guarded by sampler penalties (repeat penalty, frequency penalty, presence penalty, DRY multiplier), prompt-loop detection with sliding windows and escalation, and drift correction with periodic checks for output divergence.
- Infrastructure loops — benchmark and deployment scripts spinning without progress. Guarded by configuration audits, test plans, and PEG parser epsilon guards with zero-progress break conditions.
These are not optional. An agentic system without loop detection will eventually loop. The question is only when and how badly.
The broader point
Self-slop is a concrete implementation of a general principle: systems that can optimize their own operation outperform systems that can't.
- The model that optimizes its own prompts produces better responses.
- The agent that compacts its own context runs longer without degradation.
- The system that detects its own loops avoids degenerative behavior.
- The orchestrator that delegates to specialized agents solves more problems than a generalist.
Each layer adds capability. Each layer adds complexity. The tradeoff is real, but the direction is clear.
What this means
Selfslop is not slop. It is not model collapse or synthetic-data degeneracy. It is the deliberate exploitation of distributional alignment between prompt and model. When a practitioner buffers their human intent through a model-native rewriter, they are not saving tokens; they are speaking the model's first language.
The implications are economic as well as epistemic. As LLMs become the primary epistemic intermediaries for an increasing share of knowledge work, the interface language matters. Human prompt engineering — the art of guessing what phrasing will activate the model's capabilities — is being outcompeted by model-native optimization, which knows the answer because it is the model.
The future belongs not to the best human prompt engineer, but to the best model whisperer: the practitioner who knows how to ask the model to ask itself.
Where this goes
This blog covers research on latent resonance, prompt engineering, model behavior, and distributed systems. Future posts will dig deeper into specific areas:
- Prompt template architecture and the geometry of effective prompting.
- Distributed compaction across mesh networks.
- Loop detection mechanisms and their failure modes.
- Empirical measurement of self-slop effect sizes.