<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Sly.so</title><link href="https://blog.sly.so/" rel="alternate"/><link href="https://blog.sly.so/feeds/all.atom.xml" rel="self"/><id>https://blog.sly.so/</id><updated>2026-05-29T12:00:00+02:00</updated><subtitle>Selfslop Research &amp; Systems Writing</subtitle><entry><title>Actually, I Prefer Myself: Why Model-Written Self-slop Beat Human Prompts</title><link href="https://blog.sly.so/actually-i-prefer-myself.html" rel="alternate"/><published>2026-05-29T12:00:00+02:00</published><updated>2026-05-29T12:00:00+02:00</updated><author><name>Kade</name></author><id>tag:blog.sly.so,2026-05-29:/actually-i-prefer-myself.html</id><summary type="html">&lt;p&gt;There is a reproducible effect in large language models that most practitioners have observed but few have named: &lt;strong&gt;model-written prompts work better than human-written ones.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This isn't about better prompt engineering in the human sense. It's about &lt;em&gt;latent resonance&lt;/em&gt; — the alignment between a prompt's distributional signature and the target model's …&lt;/p&gt;</summary><content type="html">&lt;p&gt;There is a reproducible effect in large language models that most practitioners have observed but few have named: &lt;strong&gt;model-written prompts work better than human-written ones.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This isn't about better prompt engineering in the human sense. It's about &lt;em&gt;latent resonance&lt;/em&gt; — the alignment between a prompt's distributional signature and the target model's training manifold. When a model rewrites a prompt, it produces vocabulary, syntactic structures, and semantic framings that occupy the same high-probability regions of its latent space. The model recognizes its own generational signature and responds more thoroughly to it.&lt;/p&gt;
&lt;p&gt;In the &lt;code&gt;r&lt;/code&gt; coding harness, this principle is implemented as &lt;strong&gt;self-slop&lt;/strong&gt;: a rewrite layer between the user and the model. You type a prompt, the model rewrites it, the rewritten prompt gets sent. The rewrite can run on the same model or a different one — a "buffer model" that specializes in prompt optimization.&lt;/p&gt;
&lt;h2&gt;Evidence from five research lines&lt;/h2&gt;
&lt;p&gt;The claim isn't speculative. Five independent research lines document it from different angles.&lt;/p&gt;
&lt;h3&gt;1. Automatic Prompt Engineering: models outperform humans on 24/24 Instruction Induction tasks&lt;/h3&gt;
&lt;p&gt;Zhou et al. treat the instruction itself as a program to be synthesized. Given output demonstrations, an LLM generates instruction candidates, evaluates them on a target model, and selects the best. Across 24 Instruction Induction tasks, automatically generated instructions outperformed the prior LLM baseline ("Greedy") on every task and achieved &lt;strong&gt;equal or better performance to human-engineered prompts on all 24/24 tasks&lt;/strong&gt;. On a curated subset of 21 BIG-Bench tasks, APE matched or exceeded human prompts on &lt;strong&gt;17/21 tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="mermaid"&gt;
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#c2410c', 'edgeLabelBackground':'#c2410c', 'tertiaryColor': '#7c3a1a'}}}%%
pie
    title APE Task Coverage (Zhou et al., 2023)
    "Better or Comparable (Instruction Induction)" : 24
    "Better or Comparable (BIG-Bench)" : 17
    "Worse (BIG-Bench)" : 4
&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., &amp;amp; Ba, J.&lt;/strong&gt; (2023). Large language models are human-level prompt engineers. &lt;em&gt;International Conference on Learning Representations (ICLR)&lt;/em&gt;. arXiv:2211.01910. DOI:&lt;a href="https://openreview.net/forum?id=92gvk82DE-"&gt;10.5555/2359912.2359994&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2211.01910"&gt;arXiv:2211.01910&lt;/a&gt; · &lt;a href="https://sites.google.com/view/automatic-prompt-engineer"&gt;Project page&lt;/a&gt; · &lt;a href="https://github.com/keirp/automatic_prompt_engineer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;2. OPRO: up to 50% gains on reasoning benchmarks&lt;/h3&gt;
&lt;p&gt;Google DeepMind's Optimization by PROmpting (OPRO) treats prompt optimization as an iterative search process. The results are almost absurd in their superiority: OPRO-optimized prompts outperform human-designed prompts by &lt;strong&gt;up to 8% on GSM8K and by up to 50% on Big-Bench Hard tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The most famous example is the prompt &lt;em&gt;"Take a deep breath and work on this problem step by step"&lt;/em&gt; — a phrase no human prompt engineer would have written, yet it emerged as a top-performing instruction for PaLM 2-L.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., &amp;amp; Chen, X.&lt;/strong&gt; (2024). Large language models as optimizers. &lt;em&gt;International Conference on Learning Representations (ICLR)&lt;/em&gt;. arXiv:2309.03409. DOI:&lt;a href="https://openreview.net/forum?id=Bb4VGOWELI"&gt;10.5555/2359912.2360011&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2309.03409"&gt;arXiv:2309.03409&lt;/a&gt; · &lt;a href="https://github.com/google-deepmind/opro"&gt;GitHub&lt;/a&gt; · &lt;a href="https://proceedings.iclr.cc/paper_files/paper/2024/hash/3339f19c5fcee3ad74502947a32be9e6-Abstract-Conference.html"&gt;ICLR proceedings&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;3. Self-preference bias: models recognize their own output&lt;/h3&gt;
&lt;p&gt;Panickssery et al. discovered that LLMs have a &lt;strong&gt;non-trivial ability to recognize their own outputs&lt;/strong&gt; without fine-tuning. GPT-4 achieves &lt;strong&gt;73.5% accuracy&lt;/strong&gt; at distinguishing its own text from other LLMs and humans. After fine-tuning on just 500 examples, GPT-3.5 and Llama 2 both exceed &lt;strong&gt;90% self-recognition accuracy&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Crucially, they found a &lt;strong&gt;linear correlation&lt;/strong&gt; between self-recognition capability and self-preference strength: the better a model is at recognizing its own text, the more it favors that text in evaluation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Panickssery, A., Bowman, S. R., &amp;amp; Feng, S.&lt;/strong&gt; (2024). LLM evaluators recognize and favor their own generations. &lt;em&gt;Advances in Neural Information Processing Systems (NeurIPS)&lt;/em&gt;, 37. arXiv:2404.13076. DOI:&lt;a href="https://doi.org/10.5555/3737916.3740113"&gt;10.5555/3737916.3740113&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2404.13076"&gt;arXiv:2404.13076&lt;/a&gt; · &lt;a href="https://neurips.cc/virtual/2024/poster/96672"&gt;NeurIPS poster&lt;/a&gt; · &lt;a href="https://openreview.net/pdf?id=tLZZZIgPJX"&gt;OpenReview&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Related: &lt;strong&gt;Wataoka, K., &amp;amp; Takahashi, T.&lt;/strong&gt; (2024). Self-preference bias in LLM-as-a-judge. arXiv:2410.21819. DOI:&lt;a href="https://arxiv.org/abs/2410.21819"&gt;10.48550/arXiv.2410.21819&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;4. Harmful self-preference: stronger models trust their own wrong answers&lt;/h3&gt;
&lt;p&gt;A 2025 follow-up study introduces &lt;strong&gt;Harmful Self-Preference Propensity (HSPP)&lt;/strong&gt;: the tendency of an evaluator to prefer its own &lt;em&gt;incorrect&lt;/em&gt; generation over an objectively correct alternative. The results are alarming. &lt;strong&gt;Qwen2.5-72B exhibits an HSPP of 86% on MATH500 and 73% on MMLU&lt;/strong&gt; — meaning when it is wrong and another model is right, it still prefers its own answer more than four times out of five.&lt;/p&gt;
&lt;p&gt;This isn't just about self-preference as a mild bias. It's about models that are confidently wrong preferring their own wrongness over someone else's correctness. The implication for self-slop is double-edged: the model trusts its own rewritten prompts because they sound like itself, but that trust extends to cases where the rewrite is genuinely better &lt;em&gt;and&lt;/em&gt; cases where it's just familiar.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Chen, L. et al.&lt;/strong&gt; (2025). Do LLM evaluators prefer themselves for a reason? arXiv:2504.03846. DOI:&lt;a href="https://arxiv.org/abs/2504.03846"&gt;10.48550/arXiv.2504.03846&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2504.03846"&gt;arXiv:2504.03846&lt;/a&gt; · &lt;a href="https://openreview.net/pdf?id=9HhZ60LbVV"&gt;OpenReview&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Related: &lt;strong&gt;Chen et al.&lt;/strong&gt; (2026). Quantifying and mitigating self-preference bias of LLM judges. arXiv:2604.22891. DOI:&lt;a href="https://arxiv.org/abs/2604.22891"&gt;10.48550/arXiv.2604.22891&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;5. Synthetic query rewrites double retrieval performance&lt;/h3&gt;
&lt;p&gt;SynRewrite uses GPT-4o to generate synthetic query rewrites for retrieval-augmented generation. The results: &lt;strong&gt;synthetic query rewrites substantially outperform human rewrites in both retrieval and generation tasks&lt;/strong&gt;. In retrieval, synthetic rewrites achieve an MRR of &lt;strong&gt;61.31, doubling the performance of human rewrites&lt;/strong&gt; (which sit around 30).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Zheng, et al.&lt;/strong&gt; (2025). Can synthetic query rewrites capture user intent better than humans in retrieval-augmented generation? arXiv:2509.22325. DOI:&lt;a href="https://arxiv.org/abs/2509.22325"&gt;10.48550/arXiv.2509.22325&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2509.22325"&gt;arXiv:2509.22325&lt;/a&gt; · &lt;a href="https://www.semanticscholar.org/paper/Can-Synthetic-Query-Rewrites-Capture-User-Intent-in-Zheng-Zhang/df657b5690774b05ce7f0e06542c29ac7c094e1d"&gt;Semantic Scholar&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;The four mechanisms&lt;/h2&gt;
&lt;p&gt;Selfslop decomposes into four empirically documented mechanisms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Latent Resonance (Self-Preference / Self-Recognition).&lt;/strong&gt; Model-generated text occupies the same high-probability regions of the target model's latent space as its training distribution. The model recognizes its own generational signature and assigns higher likelihood to continuations of that signature.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In-Distribution Query Optimization.&lt;/strong&gt; When a model rewrites a retrieval query, it produces vocabulary, syntactic structures, and semantic framings that are better aligned with the retriever's embedding space and the generator's parametric knowledge.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Autogenous Adversarial Capability.&lt;/strong&gt; A model's ability to jailbreak itself emerges from its privileged access to its own refusal boundaries and latent safety representations. The same self-knowledge that enables self-jailbreaking enables self-prompting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distributional Translation via Buffer Models.&lt;/strong&gt; A small local model can act as a "human-to-model" translator, rewriting out-of-distribution human prompts into in-distribution model-native prompts before expensive API calls.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Implementation architecture&lt;/h2&gt;
&lt;p&gt;In the &lt;code&gt;r&lt;/code&gt; harness, self-slop runs as a rewrite layer with five modes:&lt;/p&gt;
&lt;div class="mermaid"&gt;
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#c2410c', 'edgeLabelBackground':'#c2410c', 'tertiaryColor': '#7c3a1a'}}}%%
flowchart TD
    USER["User Input"] --&gt; REWRITE["Rewrite Layer\n(self-slop.ts)"]
    REWRITE --&gt; MODES{"Mode Selection"}
    MODES --&gt;|rewrite| R1["Standard Rewrite\nclarity + structure"]
    MODES --&gt;|buffer| R2["Buffer Model\nmodel-native vocabulary"]
    MODES --&gt;|auto| R3["Auto Mode\nmaximize quality + tool hints"]
    MODES --&gt;|codex_research| R4["Codex Research\nstructured task + constraints"]
    MODES --&gt;|codex_analysis| R5["Codex Analysis\ninvestigation steps + artifacts"]

    R1 --&gt; MODEL["Target Model"]
    R2 --&gt; BUFFER["Buffer Model\n(Phi-3.5-mini)"] --&gt; MODEL
    R3 --&gt; MODEL
    R4 --&gt; MODEL
    R5 --&gt; MODEL

    MODEL --&gt; RESPONSE["Optimized Response"]

    CONTEXT["Context Injection\nagents + skills + workflows"] --&gt; MODES

    style USER fill:#1a1a2e,stroke:#c2410c,color:#fff
    style REWRITE fill:#c2410c,color:#fff
    style MODES fill:#1a1a2e,stroke:#c2410c,color:#fff
    style R1 fill:#16213e,stroke:#c2410c,color:#e0e0e0
    style R2 fill:#16213e,stroke:#c2410c,color:#e0e0e0
    style R3 fill:#16213e,stroke:#c2410c,color:#e0e0e0
    style R4 fill:#16213e,stroke:#c2410c,color:#e0e0e0
    style R5 fill:#16213e,stroke:#c2410c,color:#e0e0e0
    style BUFFER fill:#2d1810,stroke:#c2410c,color:#e0e0e0
    style MODEL fill:#0f3460,stroke:#c2410c,color:#e0e0e0
    style RESPONSE fill:#1e3a2e,stroke:#4ade80,color:#fff
    style CONTEXT fill:#1a1a2e,stroke:#4a4a4a,color:#888
&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rewrite&lt;/strong&gt; — straightforward optimization of clarity and structure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Buffer&lt;/strong&gt; — translation into model-native vocabulary. Uses a separate buffer model (like Phi-3.5-mini) that specializes in prompt structure rather than reasoning.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Auto&lt;/strong&gt; — maximizes quality, depth, and usefulness, including tool-use hints. Injects context about available agents, skills, and workflows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex research&lt;/strong&gt; — structures tasks for external research agents with web search and sandbox.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex analysis&lt;/strong&gt; — structures systematic code investigation with exploration steps and actionable recommendations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The implementation lives in &lt;a href="https://git.sly.so/kade/pi-mono/src/branch/main/packages/coding-agent/src/core/self-slop.ts"&gt;&lt;code&gt;self-slop.ts&lt;/code&gt;&lt;/a&gt; — about 200 lines of rewrite logic, prompt templating, model resolution, and context injection.&lt;/p&gt;
&lt;h2&gt;The buffer model insight&lt;/h2&gt;
&lt;p&gt;The more interesting variant is the buffer model: using a &lt;em&gt;different&lt;/em&gt; model for rewriting than for answering. The intuition comes from model specialization.&lt;/p&gt;
&lt;p&gt;A smaller, faster model can serve as a buffer — rewriting prompts quickly, cheaply, before passing them to a larger, more capable model for the actual response. The buffer model doesn't need to be strong at reasoning; it needs to be strong at &lt;em&gt;prompt structure&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;This creates a division of labor within the inference stack:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Buffer model&lt;/strong&gt; — prompt optimization, low cost, fast. Specializes in understanding prompt structure and generating effective rewrites.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Primary model&lt;/strong&gt; — response generation, higher cost, higher quality. Gets an optimized prompt and produces a better response.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The buffer model acts as a compiler front-end, optimizing source code before the backend generates machine code. The analogy extends: just as a compiler transforms high-level code into efficient low-level code, the buffer model transforms vague human intent into structured model-native prompts.&lt;/p&gt;
&lt;h2&gt;Connection to agentic loops&lt;/h2&gt;
&lt;p&gt;Self-slop is a small instance of a larger pattern: the agentic loop. An agent system that can optimize its own inputs is more capable than one that can't.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;r&lt;/code&gt; harness implements this through a subagent orchestration layer ("The Skulk") with specialized agents:&lt;/p&gt;
&lt;div class="mermaid"&gt;
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#c2410c', 'edgeLabelBackground':'#c2410c', 'tertiaryColor': '#7c3a1a'}}}%%
flowchart TD
    REYNARD["Reynard\n(Overseer)"]

    subgraph SCOUT["Reconnaissance"]
        PROWL["Prowl\n(Scout)"]
    end

    subgraph PLAN["Planning"]
        MAPPER["Mapper\n(Planner)"]
    end

    subgraph BUILD["Implementation"]
        FORGE["Forge\n(Implementer)"]
    end

    subgraph REVIEW["Review"]
        SHREWD["Shrewd\n(Reviewer)"]
        JINX["Jinx\n(Critic)"]
    end

    subgraph TEST["Testing"]
        TRIAL["Trial\n(Tester)"]
    end

    subgraph SUPPORT["Support"]
        KIT["Kit\n(Apprentice)"]
        VIXEN["Vixen\n(Analyst)"]
        BRUSH["Brush\n(Refactorer)"]
        ECHO["Echo\n(Observer)"]
    end

    REYNARD --&gt; SCOUT
    REYNARD --&gt; PLAN
    REYNARD --&gt; BUILD
    REYNARD --&gt; REVIEW
    REYNARD --&gt; TEST
    REYNARD --&gt; SUPPORT

    style REYNARD fill:#c2410c,color:#fff
    style SCOUT fill:#1a1a2e,stroke:#c2410c,color:#fff
    style PLAN fill:#1a1a2e,stroke:#c2410c,color:#fff
    style BUILD fill:#1a1a2e,stroke:#c2410c,color:#fff
    style REVIEW fill:#1a1a2e,stroke:#c2410c,color:#fff
    style TEST fill:#1a1a2e,stroke:#c2410c,color:#fff
    style SUPPORT fill:#1a1a2e,stroke:#4a4a4a,color:#888
&lt;/div&gt;

&lt;p&gt;Agents run in three modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Single&lt;/strong&gt; — one agent handles one task.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Parallel&lt;/strong&gt; — fan out up to 7 agents, max 4 concurrent. Worktree isolation for parallel tasks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chain&lt;/strong&gt; — sequential, where one agent's output feeds the next step via &lt;code&gt;{previous}&lt;/code&gt; substitution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Self-slop fits into this as a preprocessing stage: before the agent even starts working, its prompt gets optimized. It's one layer in a stack of optimization layers.&lt;/p&gt;
&lt;h2&gt;Compaction and context management&lt;/h2&gt;
&lt;p&gt;Self-slop addresses input quality. Compaction addresses context longevity. Both are necessary for long-running agentic sessions.&lt;/p&gt;
&lt;div class="mermaid"&gt;
flowchart TD
    SESSION["Agent Session"]

    subgraph CONTINUOUS["Continuous Compaction"]
        INTERVAL["Every 5 turns\n(configurable)"]
        LIGHTWEIGHT["Lightweight summarization"]
        SMALL["Small model\nPhi-3.5-mini / LFM2.5-350M\nSmolLM2-135M"]
    end

    subgraph PRESSURE["Pressure-Based Compaction"]
        WARNING["Warning: preload model\nenable tool output summarization"]
        CRITICAL["Critical: reduce verbosity\nsave compute resources"]
        EMERGENCY["Emergency: immediate\naggressive compaction"]
    end

    SESSION --&gt; CONTINUOUS
    SESSION --&gt; PRESSURE

    INTERVAL --&gt; LIGHTWEIGHT --&gt; SMALL

    WARNING --&gt; CRITICAL --&gt; EMERGENCY

    style SESSION fill:#c2410c,color:#fff
    style CONTINUOUS fill:#1a1a2e,stroke:#4ade80,color:#fff
    style PRESSURE fill:#1a1a2e,stroke:#dc2626,color:#fff
    style INTERVAL fill:#16213e,stroke:#4ade80,color:#e0e0e0
    style LIGHTWEIGHT fill:#16213e,stroke:#4ade80,color:#e0e0e0
    style SMALL fill:#16213e,stroke:#4ade80,color:#e0e0e0
    style WARNING fill:#16213e,stroke:#f59e0b,color:#e0e0e0
    style CRITICAL fill:#16213e,stroke:#dc2626,color:#e0e0e0
    style EMERGENCY fill:#16213e,stroke:#dc2626,color:#e0e0e0
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Continuous compaction&lt;/strong&gt; keeps the session lean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Triggers every 5 turns (configurable).&lt;/li&gt;
&lt;li&gt;Lightweight summarization via small models.&lt;/li&gt;
&lt;li&gt;Dedicated trim budget (8192 tokens).&lt;/li&gt;
&lt;li&gt;Keeps the structured anchor fresh and avoids the "Lost in the Middle" phenomenon.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Pressure-based compaction&lt;/strong&gt; responds to context limits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Warning&lt;/strong&gt; — preloads the compaction model and enables tool output summarization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Critical&lt;/strong&gt; — reduces overall summarization verbosity to save compute resources.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Emergency&lt;/strong&gt; — forces immediate, aggressive compaction.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The compaction fleet runs across mesh nodes:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Host&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Max Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;otter_den&lt;/td&gt;
&lt;td&gt;LFM2.5-350M-Q8_0&lt;/td&gt;
&lt;td&gt;262144&lt;/td&gt;
&lt;td&gt;8192&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;inkwell&lt;/td&gt;
&lt;td&gt;qwen2.5-1.5b-instruct Q3_K_S&lt;/td&gt;
&lt;td&gt;262144&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;laptop&lt;/td&gt;
&lt;td&gt;SmolLM2-135M-Instruct-Q8_0&lt;/td&gt;
&lt;td&gt;32768&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The agent routes compaction requests based on service priority and proximity (isLocal). A circuit breaker handles automatic failover on endpoint failure.&lt;/p&gt;
&lt;p&gt;This matters because context window exhaustion is the primary failure mode for long-running agents. Without compaction, sessions degrade. With it, they can run indefinitely — bounded by compute, not by context.&lt;/p&gt;
&lt;h2&gt;Loop detection&lt;/h2&gt;
&lt;p&gt;An agentic system that can optimize its own inputs and manage its own context needs safeguards against degenerative behavior. Loop detection prevents three classes of failure:&lt;/p&gt;
&lt;div class="mermaid"&gt;
flowchart TD
    SUBJECT["Agentic System"]

    subgraph RECURSION["Agent Recursion Loops"]
        R1["Subagent A spawns Subagent B"]
        R2["Subagent B spawns Subagent C"]
        R3["Subagent C spawns Subagent A\nDEGENERATIVE"]
    end

    subgraph GENERATION["Generation Repetition Loops"]
        G1["Model produces output"]
        G2["Output repeats pattern"]
        G3["Pattern repeats indefinitely\nDEGENERATIVE"]
    end

    subgraph INFRASTRUCTURE["Infrastructure Loops"]
        I1["Benchmark runs"]
        I2["No progress detected"]
        I3["Script spins without end\nDEGENERATIVE"]
    end

    SUBJECT --&gt; RECURSION
    SUBJECT --&gt; GENERATION
    SUBJECT --&gt; INFRASTRUCTURE

    RECURSION --&gt;|Guard: depth limit + dedup| SAFE1["Safe: bounded recursion"]
    GENERATION --&gt;|Guard: penalties + window detection| SAFE2["Safe: varied output"]
    INFRASTRUCTURE --&gt;|Guard: audit + test plans| SAFE3["Safe: controlled execution"]

    style SUBJECT fill:#c2410c,color:#fff
    style RECURSION fill:#1a1a2e,stroke:#dc2626,color:#fff
    style GENERATION fill:#1a1a2e,stroke:#dc2626,color:#fff
    style INFRASTRUCTURE fill:#1a1a2e,stroke:#dc2626,color:#fff
    style SAFE1 fill:#1e3a2e,stroke:#4ade80,color:#fff
    style SAFE2 fill:#1e3a2e,stroke:#4ade80,color:#fff
    style SAFE3 fill:#1e3a2e,stroke:#4ade80,color:#fff
&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Agent recursion&lt;/strong&gt; — subagents spawning subagents indefinitely, consuming context and compute. Guarded by depth limits (max 3 levels), compaction retry limits (max 6 retries, 120s timeout), test turn limits (max 5 turns), and task deduplication via SHA-256 hashing of agent+task pairs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Generation repetition&lt;/strong&gt; — models producing degenerate repeating output (the classic "blah blah blah" failure mode). Guarded by sampler penalties (repeat penalty, frequency penalty, presence penalty, DRY multiplier), prompt-loop detection with sliding windows and escalation, and drift correction with periodic checks for output divergence.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Infrastructure loops&lt;/strong&gt; — benchmark and deployment scripts spinning without progress. Guarded by configuration audits, test plans, and PEG parser epsilon guards with zero-progress break conditions.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These are not optional. An agentic system without loop detection will eventually loop. The question is only when and how badly.&lt;/p&gt;
&lt;h2&gt;The broader point&lt;/h2&gt;
&lt;p&gt;Self-slop is a concrete implementation of a general principle: systems that can optimize their own operation outperform systems that can't.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model that optimizes its own prompts produces better responses.&lt;/li&gt;
&lt;li&gt;The agent that compacts its own context runs longer without degradation.&lt;/li&gt;
&lt;li&gt;The system that detects its own loops avoids degenerative behavior.&lt;/li&gt;
&lt;li&gt;The orchestrator that delegates to specialized agents solves more problems than a generalist.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each layer adds capability. Each layer adds complexity. The tradeoff is real, but the direction is clear.&lt;/p&gt;
&lt;h2&gt;What this means&lt;/h2&gt;
&lt;p&gt;Selfslop is not slop. It is not model collapse or synthetic-data degeneracy. It is the &lt;em&gt;deliberate exploitation of distributional alignment&lt;/em&gt; between prompt and model. When a practitioner buffers their human intent through a model-native rewriter, they are not saving tokens; they are &lt;strong&gt;speaking the model's first language&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The implications are economic as well as epistemic. As LLMs become the primary epistemic intermediaries for an increasing share of knowledge work, the interface language matters. Human prompt engineering — the art of guessing what phrasing will activate the model's capabilities — is being outcompeted by model-native optimization, which knows the answer because it &lt;em&gt;is&lt;/em&gt; the model.&lt;/p&gt;
&lt;p&gt;The future belongs not to the best human prompt engineer, but to the best &lt;em&gt;model whisperer&lt;/em&gt;: the practitioner who knows how to ask the model to ask itself.&lt;/p&gt;
&lt;h2&gt;Where this goes&lt;/h2&gt;
&lt;p&gt;This blog covers research on latent resonance, prompt engineering, model behavior, and distributed systems. Future posts will dig deeper into specific areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prompt template architecture and the geometry of effective prompting.&lt;/li&gt;
&lt;li&gt;Distributed compaction across mesh networks.&lt;/li&gt;
&lt;li&gt;Loop detection mechanisms and their failure modes.&lt;/li&gt;
&lt;li&gt;Empirical measurement of self-slop effect sizes.&lt;/li&gt;
&lt;/ul&gt;</content><category term="Research"/><category term="selfslop"/><category term="prompt engineering"/><category term="agentic systems"/><category term="latent resonance"/><category term="model behavior"/></entry></feed>