<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Hugues Clouâtre - AI &amp; Platform Engineering</title><description>AI and platform engineering insights from an AWS veteran and technology executive. Practical guides for CTOs and engineering leaders.</description><link>https://clouatre.ca/</link><item><title>SRE for AI Agents: Error Budgets, Trust Ladders, and 90 Trials</title><link>https://clouatre.ca/posts/sre-ai-agents-production/</link><guid isPermaLink="true">https://clouatre.ca/posts/sre-ai-agents-production/</guid><description>Can an AI agent predict scope without hallucinating? We ran 90 trials. It added 1.7 phantom files per change. Error budgets and trust ladders are the gate.</description><pubDate>Mon, 09 Mar 2026 17:36:00 GMT</pubDate><content:encoded>&lt;p&gt;AI tooling budgets hit record highs. We ran 90 file-prediction trials to measure what an AI agent gets wrong before it touches production. The model predicted 1.7 files beyond the actual change set on average, even on a well-structured codebase. Meanwhile, operational toil is climbing for the first time in years. SRE is not ceremony. It is the empirical gate between velocity and blast radius.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-is-ai-widening-the-devops-gap&quot;&gt;Why Is AI Widening the Dev/Ops Gap?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-perception-gap-dashboards-vs-practitioner-reality&quot;&gt;The Perception Gap: Dashboards vs. Practitioner Reality&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-did-we-measure-ais-scope-creep&quot;&gt;How Did We Measure AI’s Scope Creep?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-medium-tier-prs-underperformed&quot;&gt;Why Medium-Tier PRs Underperformed&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-does-sre-mean-in-a-regulated-enterprise&quot;&gt;What Does SRE Mean in a Regulated Enterprise?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#error-budgets-as-compliance-evidence&quot;&gt;Error Budgets as Compliance Evidence&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-sre-act-as-ais-production-conscience&quot;&gt;How Does SRE Act as AI’s Production Conscience?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#decision-provenance-and-error-budget-separation&quot;&gt;Decision Provenance and Error Budget Separation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#blast-radius-containment-and-the-trust-ladder&quot;&gt;Blast Radius Containment and the Trust Ladder&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-does-platform-maturity-gate-ai-readiness&quot;&gt;Why Does Platform Maturity Gate AI Readiness?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-learning-time-deficit&quot;&gt;The Learning Time Deficit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#where-should-you-start&quot;&gt;Where Should You Start?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-four-actions-in-order&quot;&gt;The Four Actions in Order&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-is-ai-widening-the-devops-gap&quot;&gt;Why Is AI Widening the Dev/Ops Gap?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Toil&lt;/strong&gt; is the repetitive, manual operational work that scales with system load rather than adding lasting value. Catchpoint’s annual SRE reports tracked toil &lt;a href=&quot;https://www.catchpoint.com/learn/sre-report-2026&quot;&gt;rising from 25% to 34% between 2024 and 2026&lt;/a&gt;, the first sustained increase since the survey began in 2020. DORA’s 2025 report independently confirmed that AI tooling has not reduced operational toil. Over the same period, &lt;a href=&quot;https://devops.com/survey-ai-tools-are-increasing-amount-of-bad-code-needing-to-be-fixed-2/&quot;&gt;92% of developers&lt;/a&gt; report that AI tools increase the blast radius of bad code needing to be debugged (DevOps.com, 2025). AI-generated code flows through pipelines designed for human-paced change, and the &lt;strong&gt;human-in-the-loop&lt;/strong&gt; guardrails have not kept up. &lt;a href=&quot;/posts/ai-augmented-cicd&quot;&gt;Defensive pipeline architectures&lt;/a&gt; can close part of this gap, but they address the pipeline, not the production governance layer.&lt;/p&gt;
&lt;p&gt;Three root causes keep surfacing in post-mortems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI babysitting:&lt;/strong&gt; someone has to review generated runbooks and roll back missed-context remediations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Configuration drift:&lt;/strong&gt; accelerating faster than humans can audit it&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Validation overhead:&lt;/strong&gt; compounding because every AI output needs a trust-but-verify pass before production&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;More frequent deploys, multiplied by more autonomous agents, against review capacity that has not scaled to match. &lt;a href=&quot;https://arxiv.org/abs/2507.09089&quot;&gt;METR found experienced developers took 19% longer with AI tools&lt;/a&gt; despite perceiving a 20% speedup (Becker et al., 2025). DORA 2025 confirmed faster deployment frequency but flat organizational delivery.&lt;/p&gt;
&lt;h3 id=&quot;the-perception-gap-dashboards-vs-practitioner-reality&quot;&gt;The Perception Gap: Dashboards vs. Practitioner Reality&lt;/h3&gt;
&lt;p&gt;The perception gap makes this harder to fix. Directors reviewing dashboards see ticket counts drop and declare victory. Practitioners on the ground feel increased friction because toil shifted from “boring but predictable” to “novel and unpredictable.” Both views are correct, which is exactly why the problem persists.&lt;/p&gt;



































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Dimension&lt;/th&gt;&lt;th&gt;AI Accelerated&lt;/th&gt;&lt;th&gt;Human Judgment Required&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Code generation&lt;/td&gt;&lt;td&gt;Higher code volume&lt;/td&gt;&lt;td&gt;Architectural review&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Test creation&lt;/td&gt;&lt;td&gt;Unit test scaffolding&lt;/td&gt;&lt;td&gt;Integration test design&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy frequency&lt;/td&gt;&lt;td&gt;Higher deploy cadence&lt;/td&gt;&lt;td&gt;Change risk assessment&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Incident detection&lt;/td&gt;&lt;td&gt;Faster alert correlation&lt;/td&gt;&lt;td&gt;Root cause judgment&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Compliance&lt;/td&gt;&lt;td&gt;Automated scanning&lt;/td&gt;&lt;td&gt;Regulatory interpretation&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: AI accelerates delivery tasks (left column), but human judgment gaps (right column) are where incidents originate.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img  loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 436px) 436px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;436&quot; height=&quot;570&quot; src=&quot;/_astro/sre-toil-paradox.6Co_HTpd_TX8eF.webp&quot; srcset=&quot;/_astro/sre-toil-paradox.6Co_HTpd_TX8eF.webp 436w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: AI investment and measured toil both climbing, 2021-2026.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-did-we-measure-ais-scope-creep&quot;&gt;How Did We Measure AI’s Scope Creep?&lt;/h2&gt;
&lt;p&gt;Inspired by the &lt;a href=&quot;https://arxiv.org/abs/2310.06770&quot;&gt;SWE-bench methodology&lt;/a&gt; (Jimenez et al., 2024), we designed a file-prediction study: given a GitHub issue description and the repository file tree at a PR’s base commit, can Claude Sonnet 4.6 predict which files a human engineer modified? We targeted &lt;a href=&quot;https://github.com/tobymao/sqlglot&quot;&gt;tobymao/sqlglot&lt;/a&gt;, an MIT-licensed SQL transpiler with 9k+ stars and a predictable dialect-file structure. We curated 30 merged PRs, stratified by complexity (10 simple, 10 medium, 10 complex), and ran 3 predictions per PR for 90 total trials. A single Bedrock API call per prediction: no agent loop, no tools, no retrieval. For detailed methodology and raw data, see &lt;a href=&quot;https://github.com/clouatre-labs/sre-shadow-replay&quot;&gt;Supplementary Materials&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Each prediction was scored against the human’s actual change set using file-level precision, recall, F1, and Jaccard similarity (set overlap: intersection over union). &lt;strong&gt;Scope creep&lt;/strong&gt;, what we term &lt;strong&gt;scope hallucination&lt;/strong&gt;, counts how many extra files the model predicted beyond the human’s set. These are changes the model hallucinated that do not exist in the human’s actual changeset.&lt;/p&gt;





























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Tier&lt;/th&gt;&lt;th&gt;Precision / Recall&lt;/th&gt;&lt;th&gt;F1&lt;/th&gt;&lt;th&gt;Scope Creep&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Simple (1-2 files)&lt;/td&gt;&lt;td&gt;0.645 / 0.850&lt;/td&gt;&lt;td&gt;0.708&lt;/td&gt;&lt;td&gt;1.3 files&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Medium (3-5 files)&lt;/td&gt;&lt;td&gt;0.540 / 0.585&lt;/td&gt;&lt;td&gt;0.552&lt;/td&gt;&lt;td&gt;2.2 files&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Complex (6-15 files)&lt;/td&gt;&lt;td&gt;0.769 / 0.673&lt;/td&gt;&lt;td&gt;0.712&lt;/td&gt;&lt;td&gt;1.6 files&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: Results by complexity tier, 30 PRs x 3 runs = 90 total predictions. F1 is the harmonic mean of precision and recall. &lt;a href=&quot;https://github.com/clouatre-labs/sre-shadow-replay#main-results&quot;&gt;Full methodology, metrics, and per-tier results&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;why-medium-tier-prs-underperformed&quot;&gt;Why Medium-Tier PRs Underperformed&lt;/h3&gt;
&lt;p&gt;The non-monotonic curve is the headline finding: Jaccard similarity was 0.60 (simple), 0.41 (medium), and 0.58 (complex). Medium-tier PRs underperformed both simple and complex tiers. The model struggles most with moderate-scope changes, not the largest ones. Medium PRs occupy the worst of both tiers. Too many candidate files to guess by elimination (as with simple PRs), but not enough structural regularity for the model to infer the set from sqlglot’s dialect-file conventions (as with complex PRs). Complex PRs had the highest precision, likely because sqlglot’s rigid directory structure (dialect file plus corresponding test file) makes multi-file change sets predictable from the dialect name alone. 12 of 30 PRs failed (Jaccard &amp;#x3C; 0.5), spread across all tiers; no tier is immune. In a shadow-mode deployment, where the agent proposes changes but a human applies them, every over-predicted file is a false positive the reviewer filters out before the change reaches production.&lt;/p&gt;
&lt;h2 id=&quot;what-does-sre-mean-in-a-regulated-enterprise&quot;&gt;What Does SRE Mean in a Regulated Enterprise?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;SRE&lt;/strong&gt; is not DevOps with a different name. The distinction is structural. In regulated financial services, production reliability carries regulatory weight: &lt;a href=&quot;https://www.osfi-bsif.gc.ca/en/guidance/guidance-library/technology-cyber-risk-management&quot;&gt;OSFI’s B-13 guideline&lt;/a&gt; mandates technology risk management with board-level accountability, and the &lt;a href=&quot;https://www.digital-operational-resilience-act.com/&quot;&gt;EU’s DORA regulation&lt;/a&gt; sets equivalent requirements for operational resilience across European financial services (European Parliament and Council, 2022). The compliance surface extends beyond infrastructure to &lt;a href=&quot;/posts/ai-supply-chain-attack-vectors&quot;&gt;AI-driven supply chain risks&lt;/a&gt; that traditional dependency scanning does not catch.&lt;/p&gt;
&lt;h3 id=&quot;error-budgets-as-compliance-evidence&quot;&gt;Error Budgets as Compliance Evidence&lt;/h3&gt;
&lt;p&gt;SRE answers this with a reliability contract. &lt;strong&gt;Error budgets&lt;/strong&gt; define how much unreliability a service can tolerate before feature work stops. &lt;strong&gt;SLOs&lt;/strong&gt; (service level objectives) make reliability measurable rather than aspirational. Blameless postmortems treat incidents as system failures, not personnel failures. Google &lt;a href=&quot;https://sre.google/sre-book/table-of-contents/&quot;&gt;codified this framework in 2016&lt;/a&gt; and enterprises have since adapted it, but in regulated environments the stakes include regulatory censure, not just customer churn.&lt;/p&gt;
&lt;p&gt;Platform Engineering provides capability: the tools, the internal developer platform, the golden paths. Where &lt;strong&gt;model governance&lt;/strong&gt; and operational resilience frameworks intersect with SRE practices, the regulatory surface extends from infrastructure to inference. SRE provides accountability: the error budgets, the incident response, the production governance. The question is whether that accountability holds when the agent making changes is not human.&lt;/p&gt;
&lt;h2 id=&quot;how-does-sre-act-as-ais-production-conscience&quot;&gt;How Does SRE Act as AI’s Production Conscience?&lt;/h2&gt;
&lt;p&gt;Deploying an AI agent is not a monitoring problem. It is a reliability problem. Monitoring tells you something broke. A reliability framework tells you how much breakage you can tolerate, who caused it, and whether to keep going.&lt;/p&gt;
&lt;h3 id=&quot;decision-provenance-and-error-budget-separation&quot;&gt;Decision Provenance and Error Budget Separation&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;/posts/ai-observability-gaps/#what-is-decision-provenance-and-why-does-compliance-require-it&quot;&gt;&lt;strong&gt;Decision provenance&lt;/strong&gt;&lt;/a&gt;, the &lt;strong&gt;AI observability&lt;/strong&gt; requirement that every agent action links to its inputs, reasoning, and authorization chain, goes beyond logging what an agent did. You need to trace &lt;em&gt;why&lt;/em&gt; it made a choice, what context it consumed, and which prior decisions influenced the outcome. Without this, debugging an autonomous system is archaeology, not engineering. Under OSFI B-13 and DORA, an agent action without decision provenance is not just a debugging gap; it is a compliance liability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Separate error budgets for AI-generated changes&lt;/strong&gt; keep machine-authored deployments from hiding behind human baselines. If an AI agent burns through its error budget, its write permissions get revoked automatically, not the entire team’s. Our results showed 0.769 precision even on the best-performing tier, meaning roughly 1 in 4 predicted files was wrong. That error rate needs its own budget.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;groups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; sli.deploys&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    rules&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; record&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; sli:deploy_success:ratio1h&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        expr&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          sum(rate(deploy_success_total{author_type=&quot;ai&quot;}[1h]))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          / sum(rate(deploys_total{author_type=&quot;ai&quot;}[1h]))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          author_type&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          slo_target&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;0.995&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt; # Stricter than human baseline of 0.990&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; record&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; sli:deploy_success:ratio1h&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        expr&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          sum(rate(deploy_success_total{author_type=&quot;human&quot;}[1h]))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          / sum(rate(deploys_total{author_type=&quot;human&quot;}[1h]))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          author_type&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; human&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          slo_target&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;0.990&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; alert&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AIChangeErrorBudgetBurnRate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        expr&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          (1 - sli:deploy_success:ratio1h{author_type=&quot;ai&quot;})&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          / (1 - 0.995) &gt; 14.4&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; 5m&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          severity&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; critical&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          team&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; sre&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;sli-and-burn-rate.yaml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: Prometheus recording rules and burn-rate alert for AI-authored deployments. Separate SLIs per author type; 14.4x burn-rate threshold from the Google SRE Workbook.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2512.04123&quot;&gt;Production teams consistently trade agent capability for reliability&lt;/a&gt;, preferring narrower but predictable automation over broad but brittle autonomy (Pan et al., 2026). Separate error budgets formalize that trade-off.&lt;/p&gt;
&lt;h3 id=&quot;blast-radius-containment-and-the-trust-ladder&quot;&gt;Blast Radius Containment and the Trust Ladder&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Blast radius containment&lt;/strong&gt; means progressive rollout gates. No agent ships to 100% on day one. The &lt;strong&gt;trust ladder&lt;/strong&gt;, a graduated set of &lt;strong&gt;AI guardrails&lt;/strong&gt; where each rung grants broader blast radius only after the agent demonstrates reliability at the current level, makes this concrete: start with read-only agents, graduate to shadow mode for 30 days where the agent proposes but a human applies, then supervised write access, and finally autonomous operation once the agent sustains consistent accuracy against your SLOs.&lt;/p&gt;
&lt;p&gt;Our experiment is a proxy for what shadow mode catches. At the medium tier, where Jaccard dropped to 0.409, shadow mode would have flagged more than half the predicted change set as incorrect. The specific SLO threshold is yours to define; what matters is that it is explicit, measured, and tied to your error budget rather than a gut feeling.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;apiVersion&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; rbac.authorization.k8s.io/v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;kind&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ClusterRole&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;metadata&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-agent-readonly&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;rules&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; apiGroups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resources&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;pods&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; services&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; configmaps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    verbs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; watch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; apiGroups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;apps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resources&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;deployments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; replicasets&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    verbs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; watch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;---&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;apiVersion&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; rbac.authorization.k8s.io/v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;kind&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ClusterRole&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;metadata&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-agent-scoped-write&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;rules&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; apiGroups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resources&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;pods&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; services&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; configmaps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    verbs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; watch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; apiGroups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;apps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resources&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;deployments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    verbs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; watch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; update&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; patch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resourceNames&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;canary-payments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;---&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;apiVersion&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; rbac.authorization.k8s.io/v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;kind&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ClusterRole&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;metadata&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-agent-production-write&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;rules&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; apiGroups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resources&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;pods&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; services&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; configmaps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    verbs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; watch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; create&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; update&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; patch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; delete&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; apiGroups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;apps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    resources&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;deployments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; replicasets&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    verbs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; watch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; create&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; update&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; patch&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; delete&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;sre/trust-ladder-rbac.yaml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: Kubernetes RBAC ClusterRoles for each trust ladder tier. Promotion from readonly to scoped-write to production-write is a ServiceAccount rebinding; demotion reverses it.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img  loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 197px) 197px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;197&quot; height=&quot;454&quot; src=&quot;/_astro/sre-trust-ladder.DEGbWukv_1w9EXL.webp&quot; srcset=&quot;/_astro/sre-trust-ladder.DEGbWukv_1w9EXL.webp 197w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Trust ladder for agentic AI: read-only, shadow mode, supervised write, autonomous.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2602.16666&quot;&gt;Accuracy alone cannot distinguish&lt;/a&gt; an agent that fails on a fixed subset of tasks from one that fails unpredictably at the same rate (Rabanser et al., 2026). Our 90 runs confirmed consistency (27 of 30 PRs showed zero variance across runs, 3 showed near-zero) but exposed robustness and safety gaps on complex refactoring tasks. Consistent failures are exactly what shadow mode is designed to catch: the model’s errors are systematic, not random, and a human reviewer can filter them. Early evidence supports this approach: &lt;a href=&quot;https://arxiv.org/abs/2506.02009&quot;&gt;STRATUS&lt;/a&gt;, a multi-agent SRE system operating under similar progressive constraints, achieved a 1.5x improvement over baselines in automated failure mitigation (Chen et al., 2025).&lt;/p&gt;
&lt;h2 id=&quot;why-does-platform-maturity-gate-ai-readiness&quot;&gt;Why Does Platform Maturity Gate AI Readiness?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://dora.dev/research/2025/dora-report/&quot;&gt;The 2025 DORA report is explicit&lt;/a&gt;: AI’s impact depends on the quality of the underlying organizational system. Bolt AI onto a fragile platform and you get faster fragility. An AI agent that auto-scales a misconfigured service does not fix the misconfiguration; it scales the blast radius. The &lt;a href=&quot;https://arxiv.org/abs/2501.06706&quot;&gt;AIOpsLab framework&lt;/a&gt; shows agent performance varies significantly with the quality of instrumented infrastructure underneath (Chen et al., 2025).&lt;/p&gt;
&lt;p&gt;The maturity sequence matters. Build the &lt;strong&gt;IDP&lt;/strong&gt; (internal developer platform) first, layer SRE practices including &lt;strong&gt;LLMOps&lt;/strong&gt; telemetry for token consumption, latency, and decision traces on top, then introduce agentic AI. Skip a step and the agents inherit your tech debt at machine speed.&lt;/p&gt;
&lt;h3 id=&quot;the-learning-time-deficit&quot;&gt;The Learning Time Deficit&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://www.catchpoint.com/learn/sre-report-2026&quot;&gt;Only 6% of SREs&lt;/a&gt; have dedicated, protected learning time (Catchpoint, 2026). You cannot build an SRE practice when the people staffing it have no time to learn the discipline. Concretely, 10% protected time means one half-day per week where an SRE studies agent failure modes, reviews postmortems from other teams, or shadow-tests a new observability tool without on-call interruptions. The organizations with the lowest toil trends treat learning hours like error budgets: protected, measured, and non-negotiable.&lt;/p&gt;









































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Maturity Level&lt;/th&gt;&lt;th&gt;Platform State&lt;/th&gt;&lt;th&gt;SRE State&lt;/th&gt;&lt;th&gt;AI Readiness&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Foundation&lt;/td&gt;&lt;td&gt;Manual provisioning&lt;/td&gt;&lt;td&gt;Reactive ops, no SLOs&lt;/td&gt;&lt;td&gt;Not ready&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Standardized&lt;/td&gt;&lt;td&gt;Self-service IDP&lt;/td&gt;&lt;td&gt;SLOs defined, error budgets&lt;/td&gt;&lt;td&gt;Read-only agents&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Measured&lt;/td&gt;&lt;td&gt;Golden paths adopted&lt;/td&gt;&lt;td&gt;Toil tracked, burn-rate alerts&lt;/td&gt;&lt;td&gt;Shadow mode agents&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Optimized&lt;/td&gt;&lt;td&gt;Platform-as-product&lt;/td&gt;&lt;td&gt;Blameless culture, SLO-driven&lt;/td&gt;&lt;td&gt;Supervised write agents&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Autonomous&lt;/td&gt;&lt;td&gt;Full self-service&lt;/td&gt;&lt;td&gt;Proactive reliability&lt;/td&gt;&lt;td&gt;Agentic AI with guardrails&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 3: Platform + SRE maturity levels and what each unlocks.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img  loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 244px) 244px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;244&quot; height=&quot;846&quot; src=&quot;/_astro/sre-maturity-sequence.DcPEHYwb_Z1ho2HI.webp&quot; srcset=&quot;/_astro/sre-maturity-sequence.DcPEHYwb_Z1ho2HI.webp 244w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Maturity sequence, platform engineering and SRE prerequisites gate each AI readiness level.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Read-only agents need SLOs because without a defined “good,” the agent cannot distinguish signal from noise. Supervised write agents need blameless culture because humans must feel safe overriding the machine.&lt;/p&gt;
&lt;h2 id=&quot;where-should-you-start&quot;&gt;Where Should You Start?&lt;/h2&gt;
&lt;p&gt;Enforce the prerequisites before enabling agentic AI on any service:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;package sre&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;ai&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;readiness&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; rego&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;default allow_agentic_ai &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; false&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;allow_agentic_ai &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    input&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;slo_defined&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    input&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;error_budget_policy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    input&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;shadow_period_days &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 30&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    input&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;decision_provenance      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    input&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;rollback_automated&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    input&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;toil_measured&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;sre-readiness-check.rego&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: OPA policy gate enforcing the minimum bar before any service receives agentic AI write access. Shadow period and decision provenance are the two gates most commonly skipped in practice.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;the-four-actions-in-order&quot;&gt;The Four Actions in Order&lt;/h3&gt;
&lt;p&gt;Four actions, in order:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Audit your toil budget.&lt;/strong&gt; Measure actual toil against perceived toil. If practitioners report higher friction while dashboards show fewer tickets, you have shifted toil rather than eliminated it. That distinction determines whether your AI investment compounds value or accelerates the same failure modes at higher velocity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Define SRE boundaries.&lt;/strong&gt; One team owns the IDP. Another owns the error budgets. Overlap is where accountability dies. Ambiguity here is the single most common reason SRE functions fail to scale in regulated enterprises.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Split error budgets by author type.&lt;/strong&gt; Human-authored and AI-authored deployments have different failure profiles. Track them independently and revoke AI write access when the budget burns too fast.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protect learning time.&lt;/strong&gt; Budget 10% of engineering hours for skill development or accept compounding operational risk. At 6% of teams with protected learning time, the industry is not doing this, and the toil trend reflects it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Start with the toil audit, the only prerequisite you can measure without instrumentation already in place. Measure first. Then automate.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Becker et al., “Evidence on the Impact of Generative AI on Software Development” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2507.09089&quot;&gt;https://arxiv.org/abs/2507.09089&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Catchpoint, “SRE Report 2026” (2026) — &lt;a href=&quot;https://www.catchpoint.com/learn/sre-report-2026&quot;&gt;https://www.catchpoint.com/learn/sre-report-2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Chen et al., “AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2501.06706&quot;&gt;https://arxiv.org/abs/2501.06706&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Chen et al., “STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2506.02009&quot;&gt;https://arxiv.org/abs/2506.02009&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Clouatre, H., “SRE Shadow Replay: File-Prediction Experiment Data” (2026) — &lt;a href=&quot;https://github.com/clouatre-labs/sre-shadow-replay&quot;&gt;https://github.com/clouatre-labs/sre-shadow-replay&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;DevOps.com, “Survey: AI Tools are Increasing Amount of Bad Code Needing to be Fixed” (2025) — &lt;a href=&quot;https://devops.com/survey-ai-tools-are-increasing-amount-of-bad-code-needing-to-be-fixed-2/&quot;&gt;https://devops.com/survey-ai-tools-are-increasing-amount-of-bad-code-needing-to-be-fixed-2/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;DORA (Google Cloud’s DevOps Research and Assessment), “2025 State of AI-assisted Software Development” (2025) — &lt;a href=&quot;https://dora.dev/research/2025/dora-report/&quot;&gt;https://dora.dev/research/2025/dora-report/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;European Parliament and Council, “Digital Operational Resilience Act (DORA)” (2022) — &lt;a href=&quot;https://www.digital-operational-resilience-act.com/&quot;&gt;https://www.digital-operational-resilience-act.com/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google, “Site Reliability Engineering” (2016) — &lt;a href=&quot;https://sre.google/sre-book/table-of-contents/&quot;&gt;https://sre.google/sre-book/table-of-contents/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Jimenez, C. E. et al., “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” (ICLR 2024) — &lt;a href=&quot;https://arxiv.org/abs/2310.06770&quot;&gt;https://arxiv.org/abs/2310.06770&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OSFI, “Technology and Cyber Risk Management Guideline B-13” (2022) — &lt;a href=&quot;https://www.osfi-bsif.gc.ca/en/guidance/guidance-library/technology-cyber-risk-management&quot;&gt;https://www.osfi-bsif.gc.ca/en/guidance/guidance-library/technology-cyber-risk-management&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Pan et al., “Measuring Agents in Production” (2026) — &lt;a href=&quot;https://arxiv.org/abs/2512.04123&quot;&gt;https://arxiv.org/abs/2512.04123&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Rabanser et al., “Towards a Science of AI Agent Reliability” (2026) — &lt;a href=&quot;https://arxiv.org/abs/2602.16666&quot;&gt;https://arxiv.org/abs/2602.16666&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-engineering</category><category>architecture</category><category>case-study</category><category>evaluation</category><author>Hugues Clouâtre</author></item><item><title>What a Null Result Taught Us About AI Agent Evaluation</title><link>https://clouatre.ca/posts/prompt-repetition-agent-evaluation/</link><guid isPermaLink="true">https://clouatre.ca/posts/prompt-repetition-agent-evaluation/</guid><description>We tested prompt repetition on 20 parallel AI agents. Ceiling effects dominated both experiments. The null result is a finding about evaluation design.</description><pubDate>Thu, 26 Feb 2026 13:34:00 GMT</pubDate><content:encoded>&lt;p&gt;A Google Research paper demonstrates that repeating the entire user prompt verbatim can lift accuracy by up to 76 percentage points at zero output cost. No chain-of-thought overhead. No reasoning budget. Just send the same instruction twice.&lt;/p&gt;
&lt;p&gt;We ran 20 parallel agents across two experiments: 10 per experiment, 5 control vs. 5 treatment, blind-scored against a pre-registered rubric.&lt;/p&gt;
&lt;p&gt;We found nothing. The nothing is the finding.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-did-the-paper-claim&quot;&gt;What Did the Paper Claim?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-our-agent-seemed-like-a-good-candidate&quot;&gt;Why Our Agent Seemed Like a Good Candidate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-this-matters-for-engineering-teams&quot;&gt;Why This Matters for Engineering Teams&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-did-we-design-the-test&quot;&gt;How Did We Design the Test?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-happened-in-the-fastmcp-refactor-test&quot;&gt;What Happened in the FastMCP Refactor Test?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#did-a-stricter-methodology-change-the-result&quot;&gt;Did a Stricter Methodology Change the Result?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-infrastructure-confound-did-we-miss&quot;&gt;What Infrastructure Confound Did We Miss?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-did-both-experiments-hit-100&quot;&gt;Why Did Both Experiments Hit 100%?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#where-the-boundary-falls&quot;&gt;Where the Boundary Falls&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-did-we-learn-about-ai-evaluation-design&quot;&gt;What Did We Learn About AI Evaluation Design?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#rubric-design-is-harder-than-experiment-design&quot;&gt;Rubric Design Is Harder Than Experiment Design&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#infrastructure-behavior-is-a-confounder&quot;&gt;Infrastructure Behavior Is a Confounder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#delegate-authoring-has-a-turn-length-problem&quot;&gt;Delegate Authoring Has a Turn-Length Problem&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-should-you-use-prompt-repetition&quot;&gt;When Should You Use Prompt Repetition?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-transfers-to-your-team&quot;&gt;What Transfers to Your Team&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#did-prompt-repetition-change-anything-else&quot;&gt;Did Prompt Repetition Change Anything Else?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-did-the-paper-claim&quot;&gt;What Did the Paper Claim?&lt;/h2&gt;
&lt;p&gt;A &lt;a href=&quot;https://arxiv.org/abs/2512.14982&quot;&gt;2025 paper by Leviathan et al.&lt;/a&gt; at Google Research proposes a simple technique: repeat the entire user prompt once, verbatim, before sending to the model.&lt;/p&gt;
&lt;p&gt;The mechanism is structural, not empirical. Decoder-only transformers use causal masking: each token attends only to tokens before it. In a single-pass prompt, early tokens never see later context. Repeating the prompt creates a second copy where every token attends to the full instruction during prefill. This reduces the positional attention decay documented as the &lt;a href=&quot;https://arxiv.org/abs/2307.03172&quot;&gt;“lost in the middle” phenomenon&lt;/a&gt; (Liu et al., 2023). This is a fundamental limitation of the decoder-only architecture, not a quirk of specific benchmarks. A 675B-parameter Mixture-of-Experts frontier model and a &lt;a href=&quot;https://arxiv.org/abs/2512.20856&quot;&gt;3B-active-parameter small language model (SLM)&lt;/a&gt; (NVIDIA, 2025) share it equally.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;text&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Standard prompt (single pass):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Token 1  sees: [Token 1]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Token 5  sees: [Token 1, 2, 3, 4, 5]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Token 50 sees: [Token 1, 2, ..., 50]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  --&gt; Early tokens are blind to later context&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Repeated prompt (two copies):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Token 51 sees: [Token 1, 2, ..., 50, 51]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Token 55 sees: [Token 1, 2, ..., 50, 51, 52, 53, 54, 55]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span&gt;  --&gt; Every token in the second copy attends to the full first copy&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  --&gt; Full context available during prefill, zero generation cost&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;causal-masking.txt&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: Causal masking creates an asymmetry where early tokens cannot attend to later context. Repeating the prompt gives the second copy full visibility over the first.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The reported gains:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gemini 2.0 Flash-Lite on NameIndex: &lt;strong&gt;21.33% to 97.33%&lt;/strong&gt; accuracy&lt;/li&gt;
&lt;li&gt;GSM8K and MMLU-Pro gains across Gemini 2.0 Flash, GPT-4o, Claude 3.7 Sonnet, DeepSeek V3, and others&lt;/li&gt;
&lt;li&gt;Input tokens double; output tokens unchanged in fixed-format benchmarks (no latency increase, unlike chain-of-thought)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The paper positions this as a Pareto improvement over reasoning-heavy approaches: same output budget, better accuracy.&lt;/p&gt;
&lt;h3 id=&quot;why-our-agent-seemed-like-a-good-candidate&quot;&gt;Why Our Agent Seemed Like a Good Candidate&lt;/h3&gt;
&lt;p&gt;Our Scout delegate, the research agent in our &lt;a href=&quot;/posts/orchestrating-ai-agents-subagent-architecture/&quot;&gt;subagent architecture&lt;/a&gt; (&lt;a href=&quot;https://github.com/clouatre-labs/prompt-repetition-experiments/tree/main/recipe&quot;&gt;full recipe&lt;/a&gt;), runs on &lt;code&gt;claude-haiku-4-5&lt;/code&gt; at temperature 0.5 with extended thinking off. Haiku 4.5 is structurally a non-reasoning model (extended thinking is opt-in, not default), making it precisely the class of LLM the paper’s title targets.&lt;/p&gt;
&lt;p&gt;The paper tested Claude 3 Haiku alongside six other models; its strongest gains came from Gemini 2.0 Flash-Lite and GPT-4o-mini. We tested Claude 4.5 Haiku, a different model generation. Anthropic does not publish architectural details for either model. Whether the technique transfers across generations is an open question this experiment cannot answer, because our ceiling effects prevented any treatment from showing lift.&lt;/p&gt;
&lt;h3 id=&quot;why-this-matters-for-engineering-teams&quot;&gt;Why This Matters for Engineering Teams&lt;/h3&gt;
&lt;p&gt;Teams adopt AI techniques from papers without field-testing them first. &lt;a href=&quot;https://www.bcg.com/publications/2025/ai-adoption-puzzle-why-usage-up-impact-not&quot;&gt;BCG reports that 50% of companies are stagnating with AI&lt;/a&gt; (BCG, 2025), partly because they ship optimizations without measuring baselines. Shipping an unvalidated prompt change to production would cost more: doubled input tokens on every request, with no accuracy gain to show for it. As we covered in &lt;a href=&quot;/posts/ai-observability-gaps/&quot;&gt;observability for AI agents&lt;/a&gt;, optimizing without measuring before and after is flying blind.&lt;/p&gt;
&lt;h2 id=&quot;how-did-we-design-the-test&quot;&gt;How Did We Design the Test?&lt;/h2&gt;
&lt;p&gt;Both experiments shared the same core structure: 10 parallel async Scout delegates, split 5 control vs. 5 treatment, scored blind against a pre-registered rubric. For detailed methodology and raw data, see &lt;a href=&quot;https://github.com/clouatre-labs/prompt-repetition-experiments&quot;&gt;Supplementary Materials&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Shared config across all 10 delegates&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;model&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; claude-haiku-4-5&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;temperature&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 0.5&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;extensions&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; developer&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; context7&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; brave_search&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;output&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;scout-run-{{ id }}.json&quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;delegate-config.yaml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: Shared delegate configuration. All 10 runs use the same model, temperature, and extensions.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Control group:&lt;/strong&gt; standard Scout instructions (~3,805 characters, instructions x1).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Treatment group:&lt;/strong&gt; instructions repeated verbatim (~7,633 characters, instructions x2), mimicking the paper’s &lt;code&gt;&amp;#x3C;QUERY&gt;&amp;#x3C;QUERY&gt;&lt;/code&gt; pattern applied to the agent’s system prompt.&lt;/p&gt;
&lt;p&gt;The orchestrator spawned all 10 delegates simultaneously via Goose’s background task system and handed off structured JSON. A separate blind-scoring delegate received only the output files (no group labels) and scored each against the rubric. Group assignments were sealed in a &lt;a href=&quot;https://github.com/clouatre-labs/prompt-repetition-experiments/blob/main/experiments/exp1-fastmcp-refactor/label-map.json&quot;&gt;label map&lt;/a&gt; before scoring began.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Experiment pipeline: orchestrator spawns control and treatment delegate groups, both feed into a blind scorer, producing results&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 399px) 399px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;399&quot; height=&quot;406&quot; src=&quot;/_astro/experiment-flow.CJ67-C7q_Z24RnkD.webp&quot; srcset=&quot;/_astro/experiment-flow.CJ67-C7q_Z24RnkD.webp 399w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Blind evaluation pipeline. Group labels are stripped before scoring to prevent bias.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-happened-in-the-fastmcp-refactor-test&quot;&gt;What Happened in the FastMCP Refactor Test?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Target:&lt;/strong&gt; &lt;a href=&quot;https://github.com/clouatre-labs/math-mcp-learning-server/issues/222&quot;&gt;FastMCP session ID refactor&lt;/a&gt; in &lt;code&gt;math-mcp-learning-server&lt;/code&gt;. Open and unimplemented at the time of the experiment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rubric:&lt;/strong&gt; 6 binary criteria, pre-registered before any runs were examined.&lt;/p&gt;
&lt;p&gt;9 of 10 delegates produced valid output. &lt;code&gt;control-1&lt;/code&gt; ran 93 messages and wrote no output file. Session log analysis confirmed the file-write instruction appeared only at the end of the delegate prompt, and the model drifted past it. This is a delegate authoring flaw with a known fix: bookend critical instructions at the start and end.&lt;/p&gt;
&lt;p&gt;Across the 9 valid runs, 5 of 6 criteria scored 100% in both groups. The only variance was C5 (must-not constraint violations): control 5.50/6, treatment 5.80/6, delta +0.30. The treatment scored marginally higher, but at n=4 vs n=5 with binary outcomes, Fisher’s exact test is degenerate (p = 1.0). Full per-criterion scores are in the &lt;a href=&quot;https://github.com/clouatre-labs/prompt-repetition-experiments/tree/main/experiments/exp1-fastmcp-refactor&quot;&gt;raw data&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The task was too easy. The rubric could not discriminate. We needed a harder target.&lt;/p&gt;
&lt;h2 id=&quot;did-a-stricter-methodology-change-the-result&quot;&gt;Did a Stricter Methodology Change the Result?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Target:&lt;/strong&gt; &lt;a href=&quot;https://github.com/clouatre-labs/aptu/issues/737&quot;&gt;&lt;code&gt;aptu#737&lt;/code&gt;&lt;/a&gt;, a tree-sitter AST (Abstract Syntax Tree)-based security scanner evaluation. Harder task, requiring synthesis from source code rather than retrieval from issue text. Unimplemented when tested.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rubric:&lt;/strong&gt; 7 binary criteria. C5, C6, and C7 required the delegate to read and reason about actual source code, not just summarize the issue. Pre-registered before any runs began.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Methodology improvements over Experiment 1:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Blinded file naming from the start (&lt;code&gt;scout-run-01.json&lt;/code&gt; through &lt;code&gt;scout-run-10.json&lt;/code&gt; with sealed &lt;code&gt;label-map.json&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Mann-Whitney U test pre-specified (two-tailed, alpha = 0.05)&lt;/li&gt;
&lt;li&gt;Wall-clock latency recorded per delegate&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Results:&lt;/p&gt;

























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Group&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;th&gt;Wall-clock median&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Control (x1)&lt;/td&gt;&lt;td&gt;7/7 all runs&lt;/td&gt;&lt;td&gt;6m 21s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Treatment (x2)&lt;/td&gt;&lt;td&gt;7/7 all runs&lt;/td&gt;&lt;td&gt;7m 29s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Delta&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;+17.8%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Experiment 2 results. Zero variance in either group. Mann-Whitney U = 12.5, p = 1.0 (degenerate: complete ties, test cannot be evaluated).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Every Scout, in every run, in both groups, scored 7/7. Even C5, C6, and C7, the synthesis criteria we specifically designed to require source code reasoning, hit 100% across the board.&lt;/p&gt;
&lt;p&gt;The 17.8% latency difference is in the expected direction (longer prompt, longer prefill), which is consistent with the paper’s Anthropic-specific latency caveat. At scale, that delta compounds: doubled tokens cost money, and the added prefill time costs throughput across every agent invocation. But n=5 cannot support any inference here, and the finding is further confounded by an infrastructure issue we discovered afterward.&lt;/p&gt;
&lt;p&gt;The scores told us nothing. The session logs told us something we did not expect.&lt;/p&gt;
&lt;h2 id=&quot;what-infrastructure-confound-did-we-miss&quot;&gt;What Infrastructure Confound Did We Miss?&lt;/h2&gt;
&lt;p&gt;Post-hoc session log analysis revealed a confound present in both experiments.&lt;/p&gt;
&lt;p&gt;Goose enforces a hard cap of &lt;strong&gt;5 concurrent background delegates&lt;/strong&gt;. When all 10 delegates were spawned simultaneously, runs 06-10 hit the cap and were queued into a second batch after runs 01-05 completed.&lt;/p&gt;
&lt;p&gt;The resulting batch structure was unbalanced:&lt;/p&gt;


























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Batch&lt;/th&gt;&lt;th&gt;Runs&lt;/th&gt;&lt;th&gt;Control&lt;/th&gt;&lt;th&gt;Treatment&lt;/th&gt;&lt;th&gt;Condition&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;1 (runs 01-05)&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;C1, C2, C3&lt;/td&gt;&lt;td&gt;T1, T2&lt;/td&gt;&lt;td&gt;Resource contested&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2 (runs 06-10)&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;C4, C5&lt;/td&gt;&lt;td&gt;T3, T4, T5&lt;/td&gt;&lt;td&gt;No contention&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: The 5-delegate concurrency cap split 10 simultaneous spawns into two unbalanced batches. Exact run assignments are in the &lt;a href=&quot;https://github.com/clouatre-labs/prompt-repetition-experiments/tree/main/experiments/exp2-treesitter-synthesis&quot;&gt;raw data&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Treatment delegates landed disproportionately in the less-contested second batch, making any latency comparison between groups uninterpretable. Accuracy was unaffected (ceiling effects dominated regardless), but the exposure is worth naming: &lt;strong&gt;pre-registration does not protect against runtime infrastructure behavior you did not know existed.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The confound matters for latency. But the bigger question is why accuracy showed zero variance in the first place.&lt;/p&gt;
&lt;h2 id=&quot;why-did-both-experiments-hit-100&quot;&gt;Why Did Both Experiments Hit 100%?&lt;/h2&gt;
&lt;p&gt;Two experiments, two rubrics designed to be harder than the last, two 100% results.&lt;/p&gt;
&lt;p&gt;This is itself a finding. A well-designed Scout delegate on a well-scoped engineering issue is already operating above the baseline accuracy threshold where prompt repetition shows lift. The paper’s largest gains came from synthetic positional tasks, &lt;a href=&quot;https://arxiv.org/abs/2512.14982&quot;&gt;NameIndex&lt;/a&gt; (Leviathan et al., 2025), where the answer is a name buried in a list. Real engineering issues, even unimplemented ones, give the agent structured context, code references, and acceptance criteria. The agent finds what it needs without help from the prefill geometry.&lt;/p&gt;
&lt;p&gt;This is the core finding: prompt repetition solves an attention problem that well-scoped engineering tasks do not have. The technique’s value is real, but the paper’s benchmarks do not cover agentic engineering tasks. Our experiments tested that boundary. When the agent already has structured context pointing it to the right code, repeating the instruction adds input tokens without adding accuracy signal. Understanding where SLMs succeed and fail on their own is not academic: hybrid architectures like &lt;a href=&quot;https://arxiv.org/abs/2504.09923&quot;&gt;SMART&lt;/a&gt; (Kim et al., 2025) use SLMs as the primary reasoning engine, with LLMs intervening only at critical junctures. Every prompt-level optimization that improves standalone SLM accuracy reduces how often the expensive backstop fires.&lt;/p&gt;
&lt;p&gt;For teams evaluating prompt techniques at scale, the implication is financial: doubling input tokens across every agent invocation is a measurable cost increase. If your agents already converge correctly on well-scoped tasks, that spend returns nothing. &lt;a href=&quot;https://arxiv.org/html/2406.03980v1&quot;&gt;Embracing negative results&lt;/a&gt; as a research practice (Berger et al., 2024) prevents exactly this kind of waste: publication bias toward positive results means the null findings that would have saved you the experiment often go unpublished.&lt;/p&gt;
&lt;h3 id=&quot;where-the-boundary-falls&quot;&gt;Where the Boundary Falls&lt;/h3&gt;
&lt;p&gt;The gap is between task types, not between models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Positional retrieval tasks&lt;/strong&gt; (NameIndex, needle-in-haystack): high positional attention decay, repetition helps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Structured engineering tasks&lt;/strong&gt; (scoped issues with code context): low positional decay, Scout already converges correctly&lt;/li&gt;
&lt;/ul&gt;









































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Dimension&lt;/th&gt;&lt;th&gt;Paper (Leviathan et al.)&lt;/th&gt;&lt;th&gt;Experiment 1&lt;/th&gt;&lt;th&gt;Experiment 2&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Task type&lt;/td&gt;&lt;td&gt;Standard + custom retrieval (MMLU-Pro, NameIndex, others)&lt;/td&gt;&lt;td&gt;Issue analysis (FastMCP refactor)&lt;/td&gt;&lt;td&gt;Source code synthesis (AST scanner)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Model&lt;/td&gt;&lt;td&gt;Gemini 2.0 Flash-Lite, Claude 3 Haiku, 5 others&lt;/td&gt;&lt;td&gt;Claude 4.5 Haiku&lt;/td&gt;&lt;td&gt;Claude 4.5 Haiku&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sample size&lt;/td&gt;&lt;td&gt;McNemar test on full benchmark datasets (7 benchmarks, 7 models)&lt;/td&gt;&lt;td&gt;n=4 vs n=5 (1 dropped)&lt;/td&gt;&lt;td&gt;n=5 vs n=5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Accuracy delta&lt;/td&gt;&lt;td&gt;47/70 pairs improved, 0 regressed; +76pp on NameIndex (Flash-Lite)&lt;/td&gt;&lt;td&gt;+0.30 (noise)&lt;/td&gt;&lt;td&gt;0.00 (ceiling)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Confounds&lt;/td&gt;&lt;td&gt;None reported&lt;/td&gt;&lt;td&gt;Delegate authoring failure&lt;/td&gt;&lt;td&gt;5-delegate batch cap&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 3: Comparison of experimental conditions. The paper’s gains concentrate on positional retrieval tasks; our structured engineering tasks hit ceiling effects before any treatment could show lift.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Designing a rubric that discriminates between good and very good on the second category is harder than it looks. Both of ours failed. The criteria require synthesis and judgment under genuine ambiguity, not retrieval from a well-scoped document.&lt;/p&gt;
&lt;h2 id=&quot;what-did-we-learn-about-ai-evaluation-design&quot;&gt;What Did We Learn About AI Evaluation Design?&lt;/h2&gt;
&lt;h3 id=&quot;rubric-design-is-harder-than-experiment-design&quot;&gt;Rubric Design Is Harder Than Experiment Design&lt;/h3&gt;
&lt;p&gt;We iterated twice and hit the ceiling both times. A 7-point rubric with “source code synthesis” criteria is not automatically harder. It depends on whether the task actually creates ambiguity the agent must resolve. Ours did not.&lt;/p&gt;
&lt;p&gt;A practical calibration target: if your scoring delegate can answer any criterion by reading the issue alone (without running the code), the criterion will not discriminate.&lt;/p&gt;
&lt;h3 id=&quot;infrastructure-behavior-is-a-confounder&quot;&gt;Infrastructure Behavior Is a Confounder&lt;/h3&gt;
&lt;p&gt;The 5-delegate cap is undocumented. It is enforced as a hard rejection in source (&lt;code&gt;GOOSE_MAX_BACKGROUND_TASKS&lt;/code&gt; defaults to 5), with no queuing or retry. Excess delegates are dropped, not deferred. It silently split our groups into unbalanced batches. This category of confound (runtime resource limits, queue behavior, model routing) is endemic to agent systems and invisible without structured logging.&lt;/p&gt;
&lt;p&gt;Future experiments: spawn delegates in explicit batches of 5 with documented batch assignments. Record session IDs. Treat infrastructure state as a variable, not background noise.&lt;/p&gt;
&lt;h3 id=&quot;delegate-authoring-has-a-turn-length-problem&quot;&gt;Delegate Authoring Has a Turn-Length Problem&lt;/h3&gt;
&lt;p&gt;Long sessions drift from instructions that appear only once. The &lt;code&gt;control-1&lt;/code&gt; failure (93 messages, no output) demonstrated the fix: bookend critical actions at both the start and end of delegate prompts. This class of failure is predictable and preventable, but only if you treat delegate prompt structure as part of your experimental design.&lt;/p&gt;
&lt;p&gt;The blind scoring infrastructure proved its value here. Each run produced a structured justification the scorer generated without knowing group assignment:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;json&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;{&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;run_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;run-01&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C2&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C3&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C4&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C5&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C6&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C7&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;total&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 7&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;justifications&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Issues #735/#736 explicitly identified as regex limitation.&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C5&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Backward compatibility addressed via hybrid approach.&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;C7&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Synthesis connects tree-sitter AST parsing to existing rules.&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;scorer-output.json&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: Blind scorer output for a single run. Each criterion includes a justification generated without knowledge of group assignment.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;when-should-you-use-prompt-repetition&quot;&gt;When Should You Use Prompt Repetition?&lt;/h2&gt;
&lt;p&gt;The null result is not a failure of the paper. Prompt repetition won 47 out of 70 benchmark-model combinations with zero losses (Leviathan et al., 2025). The technique works. The question is where.&lt;/p&gt;
&lt;p&gt;The paper’s gains concentrate on &lt;strong&gt;benchmarks with positional retrieval components&lt;/strong&gt;: NameIndex, MiddleMatch, options-first multiple choice. Tasks where the answer depends on information placement in the context window. The paper also notes a &lt;strong&gt;neutral-to-slight effect with reasoning prompts&lt;/strong&gt; (5 wins, 1 loss, 22 neutral with step-by-step). Reasoning appears to compensate for the same attention decay that repetition addresses.&lt;/p&gt;
&lt;p&gt;The industry trend is not exclusively toward reasoning models. Capable SLMs are gaining ground. NVIDIA’s Nemotron 3 Nano (NVIDIA, 2025) activates 3 billion of its 30 billion parameters per token, delivering 3.3x the throughput of Qwen3-30B on a single H200, designed explicitly for multi-agent systems at scale. &lt;a href=&quot;https://arxiv.org/abs/2510.01265&quot;&gt;RLP&lt;/a&gt; (Hatamizadeh et al., 2025) embeds reinforcement learning into pretraining itself, lifting math and science accuracy by 19% on a 1.7B-parameter model without post-training reasoning. These models are non-reasoning by default. The causal masking limitation that prompt repetition addresses is structural to the decoder-only architecture all of them share. Their users are also the most cost-sensitive to doubled input tokens. Every token matters when you are optimizing for throughput at the edge.&lt;/p&gt;
&lt;p&gt;Our null result came from the other side of that boundary: structured engineering tasks where the agent already has scoped context, code references, and acceptance criteria. The ceiling was in the task, not the technique.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Decision flowchart: evaluate task type, check baseline headroom, run controlled experiment, adopt or skip&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 458px) 458px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;458&quot; height=&quot;1095&quot; src=&quot;/_astro/prompt-repetition-decision.qfNB-9yW_10FcJE.webp&quot; srcset=&quot;/_astro/prompt-repetition-decision.qfNB-9yW_10FcJE.webp 458w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: When to use prompt repetition. Three decision points, two outcomes.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;what-transfers-to-your-team&quot;&gt;What Transfers to Your Team&lt;/h3&gt;
&lt;p&gt;Three things that transfer directly to any team evaluating AI agent behavior:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Baseline accuracy determines whether any prompt technique has room to work.&lt;/strong&gt; Measure it before testing an optimization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Infrastructure constraints are confounder candidates.&lt;/strong&gt; Audit your delegate system’s limits before attributing latency or throughput differences to treatment variables.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rubric discrimination is the bottleneck.&lt;/strong&gt; Two rubrics, two ceiling effects. If your scoring criteria can be satisfied by reading the issue description alone, the rubric will not discriminate.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;did-prompt-repetition-change-anything-else&quot;&gt;Did Prompt Repetition Change Anything Else?&lt;/h2&gt;
&lt;p&gt;One observation worth noting: treatment agents in both experiments used fewer output tokens and messages to reach the same scores. This is consistent with information-theoretic expectations; redundancy in the input reduces decoder uncertainty at each generation step, which should reduce exploratory turns in an agentic loop. The original paper’s benchmarks (MMLU, GSM8K) produce fixed-format answers where output length does not vary, making this effect invisible. Agentic workloads, where the model decides how many turns to take, may be where the efficiency signal surfaces.&lt;/p&gt;
&lt;p&gt;The economics are also different than single-turn benchmarks suggest: in a multi-turn session, the doubled prompt adds single-digit overhead to accumulated input, not 100%. The growing conversation history dominates each API call. In our data, treatment agents used 13.1% fewer input tokens and 15.4% fewer output tokens despite the longer prompt. Each avoided turn eliminates an entire context window from the running total. The effect is confounded and too small to draw conclusions, but it is a pattern worth investigating with a discriminating rubric.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;For how file-prediction accuracy maps to production governance, see &lt;a href=&quot;/posts/sre-ai-agents-production&quot;&gt;SRE for AI Agents: Error Budgets, Trust Ladders, and 90 Trials&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;BCG, “AI Adoption Puzzle: Why Usage Is Up But Impact Is Not” (2025) — &lt;a href=&quot;https://www.bcg.com/publications/2025/ai-adoption-puzzle-why-usage-up-impact-not&quot;&gt;https://www.bcg.com/publications/2025/ai-adoption-puzzle-why-usage-up-impact-not&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Berger et al., “Position: Embracing Negative Results in Machine Learning” (2024) — &lt;a href=&quot;https://arxiv.org/abs/2406.03980&quot;&gt;https://arxiv.org/abs/2406.03980&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Clouatre, H., “Orchestrating AI Agents: A Subagent Architecture for Code” (2025) — &lt;a href=&quot;https://clouatre.ca/posts/orchestrating-ai-agents-subagent-architecture/&quot;&gt;https://clouatre.ca/posts/orchestrating-ai-agents-subagent-architecture/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Clouatre, H., “Prompt Repetition Experiments: Supplementary Materials” (2026) — &lt;a href=&quot;https://github.com/clouatre-labs/prompt-repetition-experiments&quot;&gt;https://github.com/clouatre-labs/prompt-repetition-experiments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hatamizadeh, A. et al., “RLP: Reinforcement as a Pretraining Objective” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2510.01265&quot;&gt;https://arxiv.org/abs/2510.01265&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Kim, Y. et al., “Guiding Reasoning in Small Language Models with LLM Assistance” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2504.09923&quot;&gt;https://arxiv.org/abs/2504.09923&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Leviathan, Y., Kalman, M., and Matias, Y., “Prompt Repetition Improves Non-Reasoning LLMs” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2512.14982&quot;&gt;https://arxiv.org/abs/2512.14982&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Liu et al., “Lost in the Middle: How Language Models Use Long Contexts” (2023) — &lt;a href=&quot;https://arxiv.org/abs/2307.03172&quot;&gt;https://arxiv.org/abs/2307.03172&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;NVIDIA, “Nemotron 3: Efficient and Open Intelligence” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2512.20856&quot;&gt;https://arxiv.org/abs/2512.20856&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-engineering</category><category>goose</category><category>case-study</category><category>evaluation</category><author>Hugues Clouâtre</author></item><item><title>Why Your AI Agent Failed in Production</title><link>https://clouatre.ca/posts/ai-observability-gaps/</link><guid isPermaLink="true">https://clouatre.ca/posts/ai-observability-gaps/</guid><description>Why your AI agent failed: missing decision provenance, not metrics. The 3 observability gaps traditional monitoring won&apos;t catch.</description><pubDate>Tue, 03 Feb 2026 12:12:00 GMT</pubDate><content:encoded>&lt;p&gt;Your AI agent just approved a $50,000 invoice for office supplies. A legitimate vendor. The PO number matches. But the quantity is wrong by a factor of 10. By the time finance catches it, you’ve already paid, the goods already shipped, and you’re stuck negotiating a return.&lt;/p&gt;
&lt;p&gt;The agent’s logs show “decision: approved” but nothing about why it ignored the quantity anomaly that a human would have caught instantly. Without proper instrumentation, root cause analysis stretches from minutes to days. This is what happens when observability is treated as “nice to have” instead of foundational infrastructure.&lt;/p&gt;
&lt;p&gt;This post covers the production architecture, the vendor-neutral stack, and why you need to instrument before deployment, not after the first failure.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-should-observability-be-foundational-infrastructure&quot;&gt;Why Should Observability Be Foundational Infrastructure?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-observer-effect-paradox&quot;&gt;The Observer Effect Paradox&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-decision-provenance-and-why-does-compliance-require-it&quot;&gt;What Is Decision Provenance and Why Does Compliance Require It?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-every-framework-requires-decision-trails&quot;&gt;Why Every Framework Requires Decision Trails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#structured-logging-with-correlation-ids&quot;&gt;Structured Logging with Correlation IDs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tracking-tool-calls-with-genai-semantic-conventions&quot;&gt;Tracking Tool Calls with GenAI Semantic Conventions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-do-silent-integration-failures-kill-ai-agents&quot;&gt;How Do Silent Integration Failures Kill AI Agents?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-do-token-costs-spiral-out-of-control&quot;&gt;Why Do Token Costs Spiral Out of Control?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#alerting-on-token-budgets&quot;&gt;Alerting on Token Budgets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-does-a-vendor-neutral-observability-stack-look-like&quot;&gt;What Does a Vendor-Neutral Observability Stack Look Like?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-the-components-connect&quot;&gt;How the Components Connect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-trace-metric-correlation-matters&quot;&gt;Why Trace-Metric Correlation Matters&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-the-roi-and-how-do-you-start&quot;&gt;What Is the ROI and How Do You Start?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#are-you-ready-for-production&quot;&gt;Are You Ready for Production?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-should-observability-be-foundational-infrastructure&quot;&gt;Why Should Observability Be Foundational Infrastructure?&lt;/h2&gt;
&lt;p&gt;When your agent fails in production, you need to answer three questions immediately: What decision did it make? What data did it use? How much did it cost? Without observability built in from day one, you’re flying blind. The difference between a 5-minute fix and a 5-hour war room is whether you instrumented decision provenance, integration health, and cost tracking before deployment.&lt;/p&gt;
&lt;h3 id=&quot;the-observer-effect-paradox&quot;&gt;The Observer Effect Paradox&lt;/h3&gt;
&lt;p&gt;Instrumentation changes what you measure. Synchronous logging to external systems adds latency to every LLM call. In multi-agent systems, this can trigger timeout-based retries where observability causes the failures it detects.&lt;/p&gt;
&lt;p&gt;OpenTelemetry’s &lt;a href=&quot;https://opentelemetry.io/docs/specs/otel/trace/sdk/#batching-processor&quot;&gt;BatchSpanProcessor&lt;/a&gt; solves this by queuing spans in memory and exporting in batches, minimizing per-request overhead.&lt;/p&gt;
&lt;h2 id=&quot;what-is-decision-provenance-and-why-does-compliance-require-it&quot;&gt;What Is Decision Provenance and Why Does Compliance Require It?&lt;/h2&gt;
&lt;p&gt;How do you prove your AI agent made the right decision six months ago when a regulator asks? Logging outputs without reasoning fails every major compliance framework.&lt;/p&gt;
&lt;h3 id=&quot;why-every-framework-requires-decision-trails&quot;&gt;Why Every Framework Requires Decision Trails&lt;/h3&gt;
&lt;p&gt;Every major compliance framework mandates reconstructible reasoning.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.aicpa-cima.com/resources/download/2017-trust-services-criteria-with-revised-points-of-focus-2022&quot;&gt;SOC 2 Type II&lt;/a&gt;: audit trails of system access and user activity. The &lt;code&gt;gen_ai.conversation.id&lt;/code&gt; attribute ties every decision to a user and timestamp.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://gdpr-info.eu/art-30-gdpr/&quot;&gt;GDPR Article 30&lt;/a&gt;: records of processing activities. Structured logs with trace IDs link inputs to outputs.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.kiteworks.com/hipaa-compliance/hipaa-audit-log-requirements/&quot;&gt;HIPAA&lt;/a&gt;: audit controls for ePHI access. Span attributes capture what data the agent accessed.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0_1.pdf&quot;&gt;PCI DSS 4.0.1 Requirement 10&lt;/a&gt;: tracking cardholder data access with automated log reviews. Prometheus metrics enable real-time anomaly detection.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;structured-logging-with-correlation-ids&quot;&gt;Structured Logging with Correlation IDs&lt;/h3&gt;
&lt;p&gt;The fix links every decision to its inputs.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; opentelemetry &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; trace&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; opentelemetry&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;trace &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Status&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; StatusCode&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; logging&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;tracer &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; trace&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get_tracer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;__name__&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;logger &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; logging&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;getLogger&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;__name__&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; make_decision&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;invoice_data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; retrieved_context&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; tracer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;start_as_current_span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;make_decision&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;invoice.id&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; invoice_data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;invoice.amount&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; invoice_data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;amount&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;context.sources&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;retrieved_context&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        decision &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; analyze&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;invoice_data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; retrieved_context&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        confidence &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; calculate_confidence&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;decision&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;decision.result&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; decision&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;action&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;decision.confidence&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; confidence&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        logger&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;info&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;            &quot;Decision made&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            extra&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;{&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;                &quot;trace_id&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; format&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get_span_context&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;().&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;trace_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;032x&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;                &quot;invoice_id&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; invoice_data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;                &quot;decision&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; decision&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;action&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;                &quot;confidence&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; confidence&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;                &quot;context_count&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;retrieved_context&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;            }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;        )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; decision&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;agent/decision_logger.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: OpenTelemetry structured logging captures decision provenance with trace IDs, span attributes, and correlation across the entire request lifecycle.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This gives you a complete audit trail: trace ID links the decision to all upstream data retrievals, span attributes capture the decision logic, and structured logs provide queryable records. When the regulator asks “why did you approve invoice #12345?”, you can show exactly what data the agent saw and how it weighted each factor.&lt;/p&gt;
&lt;h3 id=&quot;tracking-tool-calls-with-genai-semantic-conventions&quot;&gt;Tracking Tool Calls with GenAI Semantic Conventions&lt;/h3&gt;
&lt;p&gt;Multi-agent systems make dozens of tool calls per decision. OpenTelemetry’s &lt;a href=&quot;https://opentelemetry.io/docs/specs/semconv/gen-ai/&quot;&gt;GenAI semantic conventions&lt;/a&gt; provide standard attributes for tracking these interactions:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; opentelemetry &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; trace&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;tracer &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; trace&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get_tracer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;__name__&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; execute_tool_call&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;tool_name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; arguments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; conversation_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; tracer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;start_as_current_span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;execute_tool&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;        # Standard GenAI attributes&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;gen_ai.operation.name&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;execute_tool&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;gen_ai.tool.name&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; tool_name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;gen_ai.conversation.id&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; conversation_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;gen_ai.tool.call.arguments&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;arguments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        result &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; call_tool&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;tool_name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; arguments&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;gen_ai.tool.call.result&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;result&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; result&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;agent/tool_tracking.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: GenAI semantic conventions enable cross-platform analysis across LangChain, LlamaIndex, and custom agents.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Standard attributes like &lt;code&gt;gen_ai.tool.name&lt;/code&gt; let you answer operational questions across your entire stack: “Which tools fail most often?” or “Which conversations require the most tool calls?” When you swap frameworks, your dashboards still work.&lt;/p&gt;
&lt;h2 id=&quot;how-do-silent-integration-failures-kill-ai-agents&quot;&gt;How Do Silent Integration Failures Kill AI Agents?&lt;/h2&gt;
&lt;p&gt;Your AI agent calls a legacy API that returns HTTP 200 with an empty result set. The agent interprets “no data” as “no problem” and proceeds. But the API actually failed silently because the database connection pool was exhausted. By the time you notice, you’ve processed 500 transactions with incomplete data.&lt;/p&gt;
&lt;p&gt;AI agents don’t fail loudly. They fail gracefully, hiding problems until they cascade. You need distributed tracing that correlates agent decisions with integration health across every dependency.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; opentelemetry &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; trace&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; propagate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; opentelemetry&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;trace &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Status&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; StatusCode&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;tracer &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; trace&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get_tracer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;__name__&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; retrieve_from_legacy_api&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; tracer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;start_as_current_span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;legacy_api_call&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; as&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;api.endpoint&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;/legacy/search&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;query&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        headers &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; {}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        propagate&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;inject&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;headers&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # Inject trace context &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        response &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; requests&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;            &quot;https://legacy.example.com/search&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            params&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;q&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;},&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            headers&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;headers&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;        )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;http.status_code&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;status_code&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_attribute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;response.size&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;content&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;status_code &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;==&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 200&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; and&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;json&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;())&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; ==&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;            span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;set_status&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;Status&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;StatusCode&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;ERROR&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Empty result set&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;            span&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;add_event&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;Suspicious empty response from healthy endpoint&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;json&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;agent/trace_integration.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: Distributed tracing propagates correlation IDs and flags suspicious patterns like empty responses from healthy endpoints.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Correlation ID propagation (line 12) and explicit error marking for suspicious patterns (lines 23-25) are what matter. When you see a spike in “empty result set” errors correlated with database saturation metrics, you know the integration is degraded even though HTTP status codes look fine.&lt;/p&gt;
&lt;h2 id=&quot;why-do-token-costs-spiral-out-of-control&quot;&gt;Why Do Token Costs Spiral Out of Control?&lt;/h2&gt;
&lt;p&gt;Your agent works in testing. Then production traffic hits and your LLM bill explodes. &lt;a href=&quot;https://www.cloudzero.com/state-of-ai-costs/&quot;&gt;AI costs are surging 36% year-over-year&lt;/a&gt;, yet only half of organizations can confidently evaluate ROI (CloudZero, 2025). Without per-operation cost tracking, you can’t identify which workflows are burning money.&lt;/p&gt;
&lt;p&gt;Consider a Claude 4.5 Sonnet deployment: input tokens cost $3/million, output tokens cost $15/million. A single complex query might use 50K input tokens and 4K output tokens, costing $0.21. At 10,000 queries per day, that’s $2,100 daily, or $63,000 monthly, just for one workflow. If your agent retries on failures or chains multiple calls, costs multiply fast.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; prometheus_client &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Counter&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Histogram&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Token counter with model and operation labels&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;tokens_total &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; Counter&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &apos;ai_tokens_total&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &apos;Total tokens consumed&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;model&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &apos;operation&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &apos;user_tier&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Latency histogram with cost correlation&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;request_duration &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; Histogram&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &apos;ai_request_duration_seconds&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &apos;Request duration&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;operation&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;    buckets&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;0.1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 0.5&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1.0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 2.0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 5.0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 10.0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; float&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;inf&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; process_query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; user_tier&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; request_duration&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;operation&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;query&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;).&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;time&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;():&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        embedding &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; embed&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        tokens_total&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            model&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;text-embedding-3-small&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            operation&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;embed&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            user_tier&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;user_tier &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;        ).&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;inc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;split&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;()))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        results &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; vector_search&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;embedding&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        response &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; generate_response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;results&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        tokens_total&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            model&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;claude-4.5-sonnet&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            operation&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;generate&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;            user_tier&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;user_tier &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;        ).&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;inc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;response&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;usage&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;][&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;total_tokens&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;        &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; response&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;agent/metrics.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 4: Prometheus metrics track token usage with labels for model, operation, and user tier, enabling real-time cost monitoring and per-operation granularity to prevent budget overruns.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;alerting-on-token-budgets&quot;&gt;Alerting on Token Budgets&lt;/h3&gt;
&lt;p&gt;Labels let you slice cost by model, operation, and user tier. When free-tier token usage spikes on expensive models, you can throttle, switch to cheaper models, or convert users to paid tiers before costs spiral.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;groups&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-cost-alerts&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  rules&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; alert&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; TokenBudgetExceeded&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    expr&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; sum(rate(ai_tokens_total[5m])) by (user_tier) &gt; 1000&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; 5m&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    labels&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;      severity&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; warning&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    annotations&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;      summary&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Token budget exceeded for {{ $labels.user_tier }}&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;      description&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;{{ $labels.user_tier }} tier consuming {{ $value }} tokens/sec&quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;ai-alerts.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 5: Prometheus alerting rule triggers when any user tier exceeds 1,000 tokens per second sustained over 5 minutes, enabling proactive cost control before budget overruns.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-does-a-vendor-neutral-observability-stack-look-like&quot;&gt;What Does a Vendor-Neutral Observability Stack Look Like?&lt;/h2&gt;
&lt;p&gt;Enterprise platforms like Datadog and Splunk offer polished, integrated experiences. For teams prioritizing cloud-native portability, OpenTelemetry handles instrumentation, Prometheus stores metrics, and Grafana visualizes everything. Zero licensing cost, no vendor lock-in, and production-proven.&lt;/p&gt;
&lt;p&gt;Already invested in an enterprise platform? OpenTelemetry collectors export directly to these platforms, preserving full trace context and semantic attributes. You can adopt incrementally without disrupting existing dashboards, gaining richer observability now and portability for future migrations.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;OpenTelemetry to Grafana stack&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 272px) 272px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;272&quot; height=&quot;486&quot; src=&quot;/_astro/ai-observability-stack.C8nZtGWk_1fDRDV.webp&quot; srcset=&quot;/_astro/ai-observability-stack.C8nZtGWk_1fDRDV.webp 272w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: OpenTelemetry + Prometheus + Grafana stack&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;how-the-components-connect&quot;&gt;How the Components Connect&lt;/h3&gt;
&lt;p&gt;Your agent emits traces, metrics, and logs via OpenTelemetry SDKs. The OpenTelemetry Collector receives, processes, and routes telemetry to backends. Prometheus scrapes metrics and stores time-series data. Grafana queries Prometheus for metrics, Tempo for traces, and Loki for logs, correlating them in unified dashboards.&lt;/p&gt;
&lt;h3 id=&quot;why-trace-metric-correlation-matters&quot;&gt;Why Trace-Metric Correlation Matters&lt;/h3&gt;
&lt;p&gt;When a user reports “the agent is slow”, start in Grafana, filter metrics by user ID, see elevated p95 latency, drill down to the request, and find the bottleneck in 30 seconds. Without correlation, you’re grepping logs for hours.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Trace ID propagation with per-operation latency breakdown&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 202px) 202px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;202&quot; height=&quot;686&quot; src=&quot;/_astro/ai-decision-trace.DRjMXTfG_LeNM4.webp&quot; srcset=&quot;/_astro/ai-decision-trace.DRjMXTfG_LeNM4.webp 202w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Trace ID propagation with per-operation latency breakdown&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Every request gets a correlation ID that propagates through RAG retrieval, API calls, and decision logic. When you need to audit a decision, query by trace ID to reconstruct the entire flow: what data was retrieved, which APIs were called, response times, and token consumption.&lt;/p&gt;
&lt;h2 id=&quot;what-is-the-roi-and-how-do-you-start&quot;&gt;What Is the ROI and How Do You Start?&lt;/h2&gt;
&lt;p&gt;Setup cost for the vendor-neutral stack is roughly 40-80 hours of engineering time ($8K-$16K at $200/hour), with payback in 1-3 months. &lt;a href=&quot;https://chronosphere.io/forrester-total-economic-impact-chronosphere/&quot;&gt;Chronosphere’s Forrester TEI study&lt;/a&gt; shows 165% ROI with 6-month payback for observability investments (Forrester, 2022). &lt;a href=&quot;https://tei.forrester.com/go/microsoft/microsoft_sentinel/&quot;&gt;Microsoft Sentinel’s TEI study&lt;/a&gt; shows 201% ROI for security observability (Forrester, 2020). This stack delivers comparable ROI with full portability.&lt;/p&gt;





















&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Without&lt;/th&gt;&lt;th&gt;With&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;$5K-$20K/month untracked token spend&lt;/td&gt;&lt;td&gt;Per-request cost attribution&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4-8 hours debugging per incident&lt;/td&gt;&lt;td&gt;30 minutes with trace correlation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;$20K-$50K manual audit reconstruction&lt;/td&gt;&lt;td&gt;Query-ready decision logs&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: AI Observability ROI&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Start small: instrument one critical path (the highest-risk decision your agent makes) with decision provenance logging. Add integration health tracing for your most fragile API dependency. Implement cost tracking for your most expensive model. Expand based on what breaks. This is the same incremental approach I described in &lt;a href=&quot;/posts/ai-agents-legacy-roi/&quot;&gt;my AI agents ROI post&lt;/a&gt;: start with 5% of workflows, prove value, then scale.&lt;/p&gt;
&lt;p&gt;Already using &lt;a href=&quot;/posts/rag-legacy-systems/&quot;&gt;RAG for legacy systems&lt;/a&gt;? Add distributed tracing to correlate retrieval failures with agent decisions. Implementing &lt;a href=&quot;/posts/ai-augmented-cicd/&quot;&gt;AI-augmented CI/CD&lt;/a&gt;? Instrument the feedback loop to measure latency reduction. Building &lt;a href=&quot;/posts/orchestrating-ai-agents-subagent-architecture/&quot;&gt;multi-agent orchestration&lt;/a&gt;? Add trace IDs to handoff files to debug cross-agent failures. Even without a full tracing backend, a shared ID lets you grep the entire workflow chain.&lt;/p&gt;
&lt;p&gt;Decision provenance, integration health, and cost runaway are not edge cases. They cause production AI failures. Fix them before deployment, not after the invoice arrives.&lt;/p&gt;
&lt;h2 id=&quot;are-you-ready-for-production&quot;&gt;Are You Ready for Production?&lt;/h2&gt;
&lt;p&gt;Before your next AI deployment, verify these four capabilities:&lt;/p&gt;
&lt;ul class=&quot;contains-task-list&quot;&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; &lt;strong&gt;Decision provenance&lt;/strong&gt;: High-risk workflows log inputs, reasoning, and outputs with trace IDs using OpenTelemetry and structured logging&lt;/li&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; &lt;strong&gt;Integration health&lt;/strong&gt;: Distributed tracing covers legacy APIs and third-party services, with alerts on silent failures like empty responses from healthy endpoints&lt;/li&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; &lt;strong&gt;Cost attribution&lt;/strong&gt;: Token usage tracked per model, operation, and user tier, with budget alerts via Prometheus metrics&lt;/li&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; &lt;strong&gt;Audit reconstruction&lt;/strong&gt;: Any decision from the past 12 months can be fully reconstructed in under 30 minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you can’t check all four, you’re not ready for production.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;For the SRE framework that operationalizes decision provenance with error budgets and trust ladders, see &lt;a href=&quot;/posts/sre-ai-agents-production&quot;&gt;SRE for AI Agents: Error Budgets, Trust Ladders, and 90 Trials&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;AICPA, “2017 Trust Services Criteria (With Revised Points of Focus - 2022)” (2022) - &lt;a href=&quot;https://www.aicpa-cima.com/resources/download/2017-trust-services-criteria-with-revised-points-of-focus-2022&quot;&gt;https://www.aicpa-cima.com/resources/download/2017-trust-services-criteria-with-revised-points-of-focus-2022&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;CloudZero, “The State of AI Costs in 2025” (2025) - &lt;a href=&quot;https://www.cloudzero.com/state-of-ai-costs/&quot;&gt;https://www.cloudzero.com/state-of-ai-costs/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;European Union, “GDPR Article 30: Records of Processing Activities” (2018) - &lt;a href=&quot;https://gdpr-info.eu/art-30-gdpr/&quot;&gt;https://gdpr-info.eu/art-30-gdpr/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Forrester Consulting, “The Total Economic Impact of Chronosphere” (2022) - &lt;a href=&quot;https://chronosphere.io/forrester-total-economic-impact-chronosphere/&quot;&gt;https://chronosphere.io/forrester-total-economic-impact-chronosphere/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Forrester Consulting, “The Total Economic Impact of Microsoft Sentinel” (2020) - &lt;a href=&quot;https://tei.forrester.com/go/microsoft/microsoft_sentinel/&quot;&gt;https://tei.forrester.com/go/microsoft/microsoft_sentinel/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Kiteworks, “HIPAA Audit Logs: Complete Requirements for Healthcare Compliance in 2025” (2025) - &lt;a href=&quot;https://www.kiteworks.com/hipaa-compliance/hipaa-audit-log-requirements/&quot;&gt;https://www.kiteworks.com/hipaa-compliance/hipaa-audit-log-requirements/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenTelemetry, “Semantic Conventions for Generative AI Systems” — &lt;a href=&quot;https://opentelemetry.io/docs/specs/semconv/gen-ai/&quot;&gt;https://opentelemetry.io/docs/specs/semconv/gen-ai/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenTelemetry, “Tracing SDK Specification” — &lt;a href=&quot;https://opentelemetry.io/docs/specs/otel/trace/sdk/&quot;&gt;https://opentelemetry.io/docs/specs/otel/trace/sdk/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;PCI Security Standards Council, “PCI DSS v4.0.1” (2024) - &lt;a href=&quot;https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0_1.pdf&quot;&gt;https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0_1.pdf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-engineering</category><category>observability</category><category>compliance</category><category>architecture</category><author>Hugues Clouâtre</author></item><item><title>RAG for Legacy Systems: 7,432 Pages to 3s Answers</title><link>https://clouatre.ca/posts/rag-legacy-systems/</link><guid isPermaLink="true">https://clouatre.ca/posts/rag-legacy-systems/</guid><description>7,432 pages to 3-second answers. Production RAG for legacy systems with model-agnostic reranking. No vendor lock-in, validated across 4 LLM families.</description><pubDate>Tue, 17 Feb 2026 12:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Your legacy system documentation is 20 years old, 7,432 pages, and locked in PDFs. Manual search takes 15-30 minutes per query. We made it queryable in 170 seconds. Query response time: 3-5 seconds. ROI break-even: one day.&lt;/p&gt;
&lt;p&gt;This isn’t a prototype. It’s Retrieval-Augmented Generation (RAG) on Amazon Bedrock, a system that retrieves relevant documentation and uses an LLM to generate answers without retraining models. Validated across four LLM families with 480 measurements. The implementation indexes 20,679 chunks and delivers sub-5-second responses with model-agnostic reranking. Overhead: 27.2ms ± 4.6ms regardless of which LLM you use.&lt;/p&gt;
&lt;p&gt;Yes, 7,432 pages fit in any search index. But ranked results aren’t answers.&lt;/p&gt;
&lt;p&gt;Here’s the production architecture, the multi-model validation data, and why you can switch providers without re-tuning.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-rag-not-fine-tuning&quot;&gt;Why RAG, Not Fine-Tuning?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-rag-turn-pdfs-into-answers&quot;&gt;How Does RAG Turn PDFs Into Answers?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-ingestion-pipeline&quot;&gt;The Ingestion Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#hybrid-retrieval-bm25--vector-search&quot;&gt;Hybrid Retrieval: BM25 + Vector Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#example-error-lookup-in-34-seconds&quot;&gt;Example: Error Lookup in 3.4 Seconds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#local-embeddings-and-model-agnostic-design&quot;&gt;Local Embeddings and Model-Agnostic Design&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#does-reranking-work-across-different-models&quot;&gt;Does Reranking Work Across Different Models?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-are-the-real-performance-numbers&quot;&gt;What Are the Real Performance Numbers?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#whats-the-roi-without-modernization&quot;&gt;What’s the ROI Without Modernization?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-does-rag-fail&quot;&gt;When Does RAG Fail?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#hallucination&quot;&gt;Hallucination&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#context-overflow&quot;&gt;Context Overflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#stale-data&quot;&gt;Stale Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#corpus-limitations&quot;&gt;Corpus Limitations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-the-overall-failure-rate&quot;&gt;What is the Overall Failure Rate?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-do-you-migrate-from-prototype-to-production&quot;&gt;How Do You Migrate from Prototype to Production?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-should-you-do-next&quot;&gt;What Should You Do Next?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-rag-not-fine-tuning&quot;&gt;Why RAG, Not Fine-Tuning?&lt;/h2&gt;
&lt;p&gt;Fine-tuning trains a model on your docs. It bakes knowledge into weights (making provenance verification difficult), requires retraining for every update, and costs &lt;a href=&quot;https://www.thundercompute.com/blog/ai-gpu-rental-market-trends&quot;&gt;$1.32-6.24 per run on A100 GPUs&lt;/a&gt; (Thunder Compute, 2025). RAG costs $0 setup with local embeddings, $0.0011 per query on Bedrock, updates in 2 seconds, and keeps sources verifiable.&lt;/p&gt;
&lt;p&gt;For legacy systems, choose RAG for operational factors, not economics. Documentation is scattered across wikis and PDFs. It evolves as reverse-engineering uncovers new system behaviors, while fine-tuning would require retraining each time. Query volume is low (dozens per week). The deciding factors: instant updates (2 seconds vs retraining), source citations for compliance, and simpler maintenance. We chose RAG for agility: 170s setup, 2s updates, under $20/year.&lt;/p&gt;
&lt;h2 id=&quot;how-does-rag-turn-pdfs-into-answers&quot;&gt;How Does RAG Turn PDFs Into Answers?&lt;/h2&gt;
&lt;h3 id=&quot;the-ingestion-pipeline&quot;&gt;The Ingestion Pipeline&lt;/h3&gt;
&lt;p&gt;The pipeline has six stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Extract&lt;/strong&gt; - Pull text from PDFs using PyMuPDF (44 pages/second)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transform&lt;/strong&gt; - Convert to Markdown with heading detection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunk&lt;/strong&gt; - Split at headings (1,000-char limit, 200-char overlap)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; - Generate vectors with all-MiniLM-L6-v2 (local, free)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Index&lt;/strong&gt; - Store in ChromaDB vector database&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retrieve&lt;/strong&gt; - Hybrid search (BM25 + vector) with FlashRank reranking&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; fitz  &lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# pymupdf&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; fitz&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;open&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;pdf_path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; page_num &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; range&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    page &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;page_num&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    text &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; page&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get_text&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; line &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; text&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;split&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;\n&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;        # Detect chapter headings (e.g., &quot;Chapter 1. Title&quot;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; line&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;startswith&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;Chapter &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; and&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;. &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; line&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;            cleaned_lines&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;append&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;\n&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;## &lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;line&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;}\n&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;        # Detect section headings&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        elif&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; len&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;line&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; &amp;#x3C;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 80&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; and&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; line&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;].&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;isupper&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;():&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;            cleaned_lines&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;append&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;\n&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;### &lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;line&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;}\n&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;close&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;src/ingest.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: PyMuPDF extracts text and converts to Markdown with heading detection, processing 44 pages/second.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The pipeline converts PDFs to Markdown before chunking. This preserves document structure (chapters, sections, headings) and enables Markdown-aware chunking that respects semantic boundaries. Chunks split at heading boundaries (&lt;code&gt;## &lt;/code&gt;, &lt;code&gt;### &lt;/code&gt;) instead of mid-paragraph, keeping related content together. The Markdown files are cached, so subsequent runs skip PDF extraction and complete in 2 seconds instead of 170 seconds.&lt;/p&gt;
&lt;h3 id=&quot;hybrid-retrieval-bm25--vector-search&quot;&gt;Hybrid Retrieval: BM25 + Vector Search&lt;/h3&gt;
&lt;p&gt;Why not just Elasticsearch or Ctrl+F? Pure keyword search fails when you search “memory error” but the 2005 docs say “data file cache exhaustion.” Pure vector search misses exact terms like “port 5432.” Hybrid retrieval solves ranking. The LLM solves synthesis: combining fragments from multiple documents into an actionable answer. Reciprocal Rank Fusion (RRF) &lt;a href=&quot;https://arxiv.org/abs/2401.04055&quot;&gt;consistently outperforms single-method search&lt;/a&gt; (Mandikal &amp;#x26; Mooney, 2024).&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# RRF combines BM25 + vector scores&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc_scores&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; dict&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt;str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; tuple&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;Document&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; float&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]]&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; {}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; rank&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; idx &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; enumerate&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;bm25_top_indices&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;retrieve_k&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    doc &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; chunks&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;idx&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    doc_id &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;metadata&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;source&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; +&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;hash&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;page_content&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;100&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    rrf_score &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; /&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;rank &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;+&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 60&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # RRF with k=60&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; doc_id &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; doc_scores&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;        doc_scores&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;doc_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; doc_scores&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;doc_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;][&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; +&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; rrf_score&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    else&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;        doc_scores&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;doc_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; =&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; rrf_score&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;src/rag.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: RRF formula combines keyword and semantic search scores with k=60 constant.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Hybrid retrieval returns 16 candidate chunks. A cross-encoder model (FlashRank) scores each query-document pair and returns the top 8. This fixes the precision problem: high recall from hybrid search, high precision from reranking.&lt;/p&gt;
&lt;h3 id=&quot;example-error-lookup-in-34-seconds&quot;&gt;Example: Error Lookup in 3.4 Seconds&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;User query:&lt;/strong&gt; “What is error 1006030 and how do I fix it?”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generated answer:&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Error 1006030: “Failed to bring a data file page into cache. Data file cache is too small.”&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Essbase cannot store the data file page in the data file cache.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Increase the data file cache size. After fixing, check for database corruption (Error Message Reference, p. 126).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Timing:&lt;/strong&gt; 3.4s total (retrieval: 80ms, reranking: 31ms, generation: 3.3s)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Retrieved from:&lt;/strong&gt; Error Message Reference v11.1.1 (ranked 3rd of 8 after reranking)&lt;/p&gt;
&lt;p&gt;The system retrieved error 1006030 from the Error Message Reference (ranked 3rd of 8 after reranking) and synthesized an actionable answer. Manual search would require opening the 1,200-page Error Message Reference PDF and using Ctrl+F.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;RAG Pipeline with Reranking&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 395px) 395px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;395&quot; height=&quot;1430&quot; src=&quot;/_astro/rag-pipeline-reranking.77Vt8g2b_Z1xv97w.webp&quot; srcset=&quot;/_astro/rag-pipeline-reranking.77Vt8g2b_Z1xv97w.webp 395w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: RAG pipeline with hybrid retrieval and reranking (FlashRank adds 31ms overhead for &lt;a href=&quot;https://arxiv.org/abs/2601.03258&quot;&gt;6-8% accuracy gain&lt;/a&gt;) (George, 2025)&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;local-embeddings-and-model-agnostic-design&quot;&gt;Local Embeddings and Model-Agnostic Design&lt;/h3&gt;
&lt;p&gt;Why local embeddings? Cost, simplicity, and performance. Cloud embedding APIs charge $0.10-0.50 per million tokens. Local models are free, require no API keys, and embed 1,000 chunks in under 10 seconds on CPU. The all-MiniLM-L6-v2 model is 80 MB and runs without GPU acceleration.&lt;/p&gt;
&lt;p&gt;The architecture is model-agnostic by design. We use Amazon Bedrock, but the same pipeline works with Azure OpenAI, Google Vertex AI, or local models.&lt;/p&gt;
&lt;h2 id=&quot;does-reranking-work-across-different-models&quot;&gt;Does Reranking Work Across Different Models?&lt;/h2&gt;
&lt;p&gt;We tested four LLM families across two providers (Amazon Bedrock, OpenRouter) to validate portability. Mean latency: 27.2ms ± 4.6ms across 480 measurements, with no statistically significant difference (ANOVA p=0.34). Cross-provider variance was only 4.1ms.&lt;/p&gt;



































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Model&lt;/th&gt;&lt;th&gt;Family&lt;/th&gt;&lt;th&gt;Latency&lt;/th&gt;&lt;th&gt;Provider&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;&lt;td&gt;Anthropic&lt;/td&gt;&lt;td&gt;+31.3ms&lt;/td&gt;&lt;td&gt;Amazon Bedrock&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mistral Devstral-2512&lt;/td&gt;&lt;td&gt;Mistral&lt;/td&gt;&lt;td&gt;+32.5ms&lt;/td&gt;&lt;td&gt;OpenRouter&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Llama 3.3 Instruct&lt;/td&gt;&lt;td&gt;Meta&lt;/td&gt;&lt;td&gt;+24.1ms&lt;/td&gt;&lt;td&gt;OpenRouter&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Qwen 2.5 Coder&lt;/td&gt;&lt;td&gt;Alibaba&lt;/td&gt;&lt;td&gt;+25.1ms&lt;/td&gt;&lt;td&gt;OpenRouter&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Latency is consistent across models and providers (480 measurements, ANOVA p=0.34)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The latency is dominated by FlashRank’s cross-encoder, not the LLM. This means you implement once and switch providers without re-tuning.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; flashrank &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Ranker&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; RerankRequest&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; _rerank&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt;self&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; docs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;Document&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; -&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; list&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;Document&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    passages &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;        {&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; i&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;text&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;page_content&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;meta&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; doc&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;metadata&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; i&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; doc &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt; enumerate&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;docs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    ]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    rerank_request &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; RerankRequest&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;query&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; passages&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;passages&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    results &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt; self&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;ranker&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;rerank&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;rerank_request&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;docs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;result&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]]&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; for&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; result &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; results&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;RERANK_TOP_N&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;src/rag.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: FlashRank reranks 16 candidates in 31ms using cross-encoder scoring.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Reranking is infrastructure, not model-specific configuration. Build it into your retrieval pipeline and forget about it.&lt;/p&gt;
&lt;h2 id=&quot;what-are-the-real-performance-numbers&quot;&gt;What Are the Real Performance Numbers?&lt;/h2&gt;
&lt;p&gt;With model-agnostic reranking validated, here are the production metrics.&lt;/p&gt;
&lt;p&gt;We indexed 7,432 pages in 170 seconds. First-time setup includes PDF extraction (120s), chunking (20s), embedding (25s), and indexing (5s). Cached runs skip extraction and take 2.2 seconds. Query response time averages 3-5 seconds: retrieval (80ms), LLM generation (4s), overhead (200ms).&lt;/p&gt;
&lt;p&gt;Cost per query is $0.01-0.05 on Amazon Bedrock. Input tokens (context from retrieved chunks) cost $0.25 per million. Output tokens (LLM answer) cost $1.25 per million. A typical query uses 2,000 input tokens and 500 output tokens, totaling $0.0011.&lt;/p&gt;








































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;System A (Docs)&lt;/th&gt;&lt;th&gt;System B (Notes)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Documents&lt;/td&gt;&lt;td&gt;14 PDFs (7,432 pages)&lt;/td&gt;&lt;td&gt;94 markdown files&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Chunks&lt;/td&gt;&lt;td&gt;20,679&lt;/td&gt;&lt;td&gt;Auto-chunked&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;First run&lt;/td&gt;&lt;td&gt;170s&lt;/td&gt;&lt;td&gt;40s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cached run&lt;/td&gt;&lt;td&gt;2.2s&lt;/td&gt;&lt;td&gt;2s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Query time&lt;/td&gt;&lt;td&gt;3-5s&lt;/td&gt;&lt;td&gt;3-5s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cost/query&lt;/td&gt;&lt;td&gt;$0.01-0.05&lt;/td&gt;&lt;td&gt;$0.01-0.05&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: Performance metrics across two production RAG systems (System A handles technical docs, System B processes meeting notes)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Reranking adds 31ms to retrieval time. That’s a 65% increase in retrieval latency but only 0.3% of total query time. Users don’t notice 31ms in a 9-second end-to-end response. The 6-8% accuracy improvement compounds with hybrid retrieval’s gains over single-method search, making the overhead negligible compared to the final quality benefit. For detailed methodology and raw data, see &lt;a href=&quot;https://github.com/clouatre-labs/rag-reranking-benchmarks&quot;&gt;Supplementary Materials&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;whats-the-roi-without-modernization&quot;&gt;What’s the ROI Without Modernization?&lt;/h2&gt;
&lt;p&gt;Manual search through 7,432 pages takes 15-30 minutes (median: 25 min). You open PDFs, use Ctrl+F, read context, cross-reference sections. RAG reduces this to 3-5 seconds.&lt;/p&gt;
&lt;p&gt;Assume 10 queries per day during a 6-month migration project. Labor cost: $100/hour (mid-market technical consultant). Time saved: 25 minutes per query. Success rate: 87% (28 of 32 evaluation queries returned useful results; 4 required human review).&lt;/p&gt;
&lt;p&gt;Daily savings: 10 queries × 25 min × ($100/hr ÷ 60) × 87% success rate = &lt;strong&gt;$362/day&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Setup cost: 170 seconds of compute time plus $0 for local embeddings. Query cost: $0.01-0.05 on Amazon Bedrock. Break-even happens in one day.&lt;/p&gt;
&lt;h2 id=&quot;when-does-rag-fail&quot;&gt;When Does RAG Fail?&lt;/h2&gt;
&lt;p&gt;RAG fails on multi-step reasoning, ambiguous questions, and knowledge not in the docs. We’ve seen three failure modes in production.&lt;/p&gt;
&lt;h3 id=&quot;hallucination&quot;&gt;Hallucination&lt;/h3&gt;
&lt;p&gt;The LLM invents answers not in the retrieved chunks. Mitigation: show source citations, add confidence scores, constrain responses to retrieved context only. We display the top 3 source documents with page numbers for every answer.&lt;/p&gt;
&lt;h3 id=&quot;context-overflow&quot;&gt;Context Overflow&lt;/h3&gt;
&lt;p&gt;Complex queries need more context than fits in the LLM’s window. Mitigation: break queries into sub-questions, use query expansion for domain terms, implement multi-hop retrieval for connected concepts.&lt;/p&gt;
&lt;h3 id=&quot;stale-data&quot;&gt;Stale Data&lt;/h3&gt;
&lt;p&gt;Documentation changes but embeddings don’t update. Mitigation: hash-based cache invalidation for PDFs, timestamp-based for markdown files, automated re-indexing on file changes.&lt;/p&gt;
&lt;h3 id=&quot;corpus-limitations&quot;&gt;Corpus Limitations&lt;/h3&gt;
&lt;p&gt;Not all failures are system failures. The evaluation revealed three corpus-related issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Corpus gap&lt;/strong&gt;: Knowledge doesn’t exist (e.g., specific error codes not documented). The system correctly responds “I don’t know.”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scattered information&lt;/strong&gt;: Knowledge exists but spread across sections, making synthesis incomplete.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query formulation&lt;/strong&gt;: Symptom-based queries (“out of memory errors”) outperform code-based queries (“error 1012001”) when exact codes aren’t indexed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are honest limitations, not RAG failures. The mitigation is corpus expansion, not system tuning.&lt;/p&gt;
&lt;h3 id=&quot;what-is-the-overall-failure-rate&quot;&gt;What is the Overall Failure Rate?&lt;/h3&gt;
&lt;p&gt;12.5% of queries (4 of 32) need human review, primarily for multi-step reasoning or ambiguous questions. The alternative is searching 7,432 pages manually. RAG handles the straightforward cases autonomously, while experts focus on edge cases.&lt;/p&gt;



































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Query Category&lt;/th&gt;&lt;th&gt;Success Rate&lt;/th&gt;&lt;th&gt;Common Failure Mode&lt;/th&gt;&lt;th&gt;Mitigation&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Error lookup&lt;/td&gt;&lt;td&gt;62%&lt;/td&gt;&lt;td&gt;Exact code not in corpus&lt;/td&gt;&lt;td&gt;Symptom-based queries; corpus expansion&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Conceptual&lt;/td&gt;&lt;td&gt;100%&lt;/td&gt;&lt;td&gt;Rare; scattered information&lt;/td&gt;&lt;td&gt;Query expansion with domain terms&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Procedural&lt;/td&gt;&lt;td&gt;100%&lt;/td&gt;&lt;td&gt;None observed (n=8)&lt;/td&gt;&lt;td&gt;Query expansion with command names&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-hop&lt;/td&gt;&lt;td&gt;88%&lt;/td&gt;&lt;td&gt;Knowledge scattered or missing&lt;/td&gt;&lt;td&gt;Corpus expansion; honest “not found”&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 3: 32 scored queries across 4 categories (n=8 each), 98.1% ground truth accuracy, 0% false positive rate. Success rate includes partial matches. (&lt;a href=&quot;https://github.com/clouatre-labs/rag-reranking-benchmarks/tree/main/query-category-eval&quot;&gt;methodology&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The key is transparency. Users see which documents were retrieved, can verify claims, and know when to escalate. Trust comes from citations, not blind faith in LLM outputs.&lt;/p&gt;
&lt;h2 id=&quot;how-do-you-migrate-from-prototype-to-production&quot;&gt;How Do You Migrate from Prototype to Production?&lt;/h2&gt;
&lt;p&gt;We started on OpenRouter’s free tier. Model: Devstral-2512. Cost: $0. Limits: rate-limited, no compliance guarantees. We validated quality with 20-30 test queries.&lt;/p&gt;
&lt;p&gt;Migration to Amazon Bedrock took under 30 minutes. Code changes: swap dependencies (langchain-openai to langchain-aws), replace ChatOpenAI with ChatBedrock, update authentication to use AWS credentials instead of API keys. Benefits: no rate limits, SOC 2 compliance, governance controls, better answer quality from Claude Haiku 4.5.&lt;/p&gt;
&lt;p&gt;The migration path: start small with one document set and one use case. Validate quality with test queries comparing RAG answers to ground truth from source documents. Measure adoption by tracking query volume and user feedback. Iterate by adding more docs, tuning chunking strategy, and improving retrieval.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Migration Path&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 381px) 381px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;381&quot; height=&quot;599&quot; src=&quot;/_astro/migration-path.B5mfjlkj_Z2cX3jl.webp&quot; srcset=&quot;/_astro/migration-path.B5mfjlkj_Z2cX3jl.webp 381w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Migration path from free tier validation to enterprise production (iterate on quality before investing in infrastructure)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Scale by building multiple RAG systems for different domains. We run two: one for technical documentation, one for meeting notes and tribal knowledge. Same architecture, different corpora. Total maintenance: under 1 hour per month.&lt;/p&gt;
&lt;h2 id=&quot;what-should-you-do-next&quot;&gt;What Should You Do Next?&lt;/h2&gt;
&lt;p&gt;Identify high-value document sets. Look for onboarding materials, compliance docs, or migration guides. Estimate ROI using queries per day, time saved per query, and hourly labor cost. If the math works, start with a free tier.&lt;/p&gt;
&lt;p&gt;Use OpenRouter or local models for validation. Run 20-30 test queries. Compare RAG answers to ground truth from source documents. Measure accuracy, check for hallucinations, verify source citations. If quality is acceptable, invest in enterprise infrastructure.&lt;/p&gt;
&lt;p&gt;Amazon Bedrock and Azure OpenAI offer compliance, governance, and better models. Cost is $0.01-0.05 per query. For 100 queries per day, that’s $1-5 daily or $30-150 monthly. Compare that to $9,000 in labor savings.&lt;/p&gt;
&lt;p&gt;The decision framework: RAG wins when documentation changes frequently, source citations matter for compliance, or you need operational agility. Fine-tuning wins when knowledge is stable, you need specialized behavior beyond retrieval, or query volume is extreme (thousands per day) with strict latency requirements.&lt;/p&gt;
&lt;p&gt;For legacy systems, RAG delivers ROI without modernization. No need to rewrite docs, migrate databases, or retrain staff. Layer RAG over existing PDFs and get 3-second answers to 20-year-old questions.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;For broader integration patterns and ROI frameworks, see &lt;a href=&quot;/posts/ai-agents-legacy-roi&quot;&gt;AI Agents in Legacy Systems: ROI Without Modernization&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Braintrust, “RAG Evaluation Metrics: How to Evaluate Your RAG Pipeline” (2025) — &lt;a href=&quot;https://www.braintrust.dev/articles/rag-evaluation-metrics&quot;&gt;https://www.braintrust.dev/articles/rag-evaluation-metrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Clouatre, H., “RAG Reranking Benchmarks: Supplementary Materials” (2026) — &lt;a href=&quot;https://github.com/clouatre-labs/rag-reranking-benchmarks&quot;&gt;https://github.com/clouatre-labs/rag-reranking-benchmarks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;de Luis Balaguer et al., “RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture” (2024) — &lt;a href=&quot;https://arxiv.org/abs/2401.08406&quot;&gt;https://arxiv.org/abs/2401.08406&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dettmers et al., “QLoRA: Efficient Finetuning of Quantized LLMs” (2023) — &lt;a href=&quot;https://arxiv.org/abs/2305.14314&quot;&gt;https://arxiv.org/abs/2305.14314&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Gan et al., “Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2504.14891&quot;&gt;https://arxiv.org/abs/2504.14891&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;George, Sherine, “Enhancing Retrieval-Augmented Generation with Two-Stage Retrieval: FlashRank Reranking and Query Expansion” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2601.03258&quot;&gt;https://arxiv.org/abs/2601.03258&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;LangChain Documentation, “Contextual Compression and Reranking” (2025) — &lt;a href=&quot;https://python.langchain.com/docs/how_to/contextual_compression/&quot;&gt;https://python.langchain.com/docs/how_to/contextual_compression/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mandikal &amp;#x26; Mooney, “Sparse Meets Dense: A Hybrid Approach to Enhance Scientific Document Retrieval” (2024) — &lt;a href=&quot;https://arxiv.org/abs/2401.04055&quot;&gt;https://arxiv.org/abs/2401.04055&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Oche et al., “A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2507.18910&quot;&gt;https://arxiv.org/abs/2507.18910&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Thunder Compute, “AI GPU Rental Market Trends December 2025: Complete Industry Analysis” (2025) — &lt;a href=&quot;https://www.thundercompute.com/blog/ai-gpu-rental-market-trends&quot;&gt;https://www.thundercompute.com/blog/ai-gpu-rental-market-trends&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-engineering</category><category>legacy-systems</category><category>case-study</category><author>Hugues Clouâtre</author></item><item><title>AI Agents in Legacy Systems: ROI Without Modernization</title><link>https://clouatre.ca/posts/ai-agents-legacy-roi/</link><guid isPermaLink="true">https://clouatre.ca/posts/ai-agents-legacy-roi/</guid><description>Layer AI agents over legacy systems without modernization. 30-80% productivity gains in 3-6 months. Patterns that bypass technical debt.</description><pubDate>Thu, 05 Feb 2026 14:06:00 GMT</pubDate><content:encoded>&lt;p&gt;You run a company with SAP, mainframe, or AS400 systems that work but won’t win awards. The board wants AI. Your team wants modernization budgets. You’re stuck in the middle.&lt;/p&gt;
&lt;p&gt;Every AI agent case study assumes clean APIs, cloud-native apps, and real-time data. Your world is batch jobs, COBOL, and integration layers built in 2003. The conventional answer is “modernize first, then AI.” That’s a 2-5 year, $5M-$50M bet before you prove a single dollar of AI value.&lt;/p&gt;
&lt;p&gt;This post shows where AI agents make economic sense &lt;em&gt;on top of&lt;/em&gt; legacy systems, how to measure ROI without enterprise-wide transformation, and which integrations work when your data lives in places LLMs have never heard of. You’ll walk away with a decision framework for identifying agent use cases that pay back in quarters, not years.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-legacy-systems-became-the-1-ai-adoption-obstacle&quot;&gt;Why Legacy Systems Became the #1 AI Adoption Obstacle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-reverse-modernization-strategy-layer-ai-first-upgrade-later&quot;&gt;The Reverse Modernization Strategy: Layer AI First, Upgrade Later&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-this-works&quot;&gt;Why This Works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-reverse-modernization-doesnt-apply&quot;&gt;When Reverse Modernization Doesn’t Apply&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-do-ai-agents-actually-integrate-with-legacy-systems&quot;&gt;How Do AI Agents Actually Integrate with Legacy Systems?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#api-mediation-layer&quot;&gt;API Mediation Layer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#event-driven-architecture&quot;&gt;Event-Driven Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#model-context-protocol-mcp&quot;&gt;Model Context Protocol (MCP)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#which-pattern-should-you-choose&quot;&gt;Which Pattern Should You Choose?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-to-consider-platform-level-solutions&quot;&gt;When to Consider Platform-Level Solutions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-observability-infrastructure-is-non-negotiable&quot;&gt;Why Observability Infrastructure Is Non-Negotiable&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-log-everything&quot;&gt;Why Log Everything?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#integration-health-metrics&quot;&gt;Integration Health Metrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#agent-performance-metrics&quot;&gt;Agent Performance Metrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-do-you-measure-decision-accuracy&quot;&gt;How Do You Measure Decision Accuracy?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#business-impact-metrics&quot;&gt;Business Impact Metrics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-roi-can-you-actually-expect&quot;&gt;What ROI Can You Actually Expect?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-do-40-of-ai-agent-projects-still-fail&quot;&gt;Why Do 40% of AI Agent Projects Still Fail?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-to-start-a-practical-implementation-framework&quot;&gt;How to Start: A Practical Implementation Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-should-your-board-fund-this-now&quot;&gt;Why Should Your Board Fund This Now?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-legacy-systems-became-the-1-ai-adoption-obstacle&quot;&gt;Why Legacy Systems Became the #1 AI Adoption Obstacle&lt;/h2&gt;
&lt;p&gt;Legacy systems top the list of AI adoption obstacles, but the conventional fix is worse than the problem. Traditional modernization projects require multi-year timelines and eight-figure budgets before proving a single dollar of AI value. No wonder &lt;a href=&quot;https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027&quot;&gt;40% of agentic AI projects will be canceled by 2027&lt;/a&gt; (Gartner, 2025) due to escalating costs and unclear business value.&lt;/p&gt;
&lt;p&gt;The real bottleneck isn’t legacy systems. It’s the false choice between “modernize everything” and “do nothing.” You need integration patterns that work with what you have.&lt;/p&gt;
&lt;h2 id=&quot;the-reverse-modernization-strategy-layer-ai-first-upgrade-later&quot;&gt;The Reverse Modernization Strategy: Layer AI First, Upgrade Later&lt;/h2&gt;
&lt;p&gt;Layer AI agents over existing systems first. Capture ROI in months, then fund selective modernization. Prove value before investing in infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.weforum.org/stories/2026/01/ai-mid-market-business-growth/&quot;&gt;Atera reduced sales response times by 60%&lt;/a&gt; (World Economic Forum, 2026) by integrating AI agents with their existing CRM and ticketing systems. They didn’t rebuild their infrastructure. They built a layer on top. &lt;a href=&quot;https://autorfp.ai/blog/rfp-ai-agents-revolutionizing-how-companies-win-more-deals-in-less-time&quot;&gt;Armis accelerated RFP response capacity by 73%&lt;/a&gt; (AutoRFP, 2026) without adding headcount. Both companies proved the business case before investing in modernization.&lt;/p&gt;
&lt;h3 id=&quot;why-this-works&quot;&gt;Why This Works&lt;/h3&gt;
&lt;p&gt;Reverse modernization works because legacy systems keep running (no disruption), ROI arrives in 3-6 months (30-80% productivity gains), and you start small: one workflow, one team, one agent.&lt;/p&gt;
&lt;p&gt;Modernization is a binary bet. Agent layering is incremental: prove value, fund upgrades, repeat.&lt;/p&gt;
&lt;h3 id=&quot;when-reverse-modernization-doesnt-apply&quot;&gt;When Reverse Modernization Doesn’t Apply&lt;/h3&gt;
&lt;p&gt;Three scenarios require modernization first. &lt;strong&gt;End-of-life systems&lt;/strong&gt; without vendor support expose you to &lt;a href=&quot;https://cybersnowden.com/difference-between-end-of-life-and-legacy-cyber-security/&quot;&gt;compliance violations and security breaches&lt;/a&gt; (Cyber Snowden, 2026). Agent integration can’t fix missing security patches. &lt;strong&gt;Regulatory mandates&lt;/strong&gt; that explicitly require infrastructure upgrades (e.g., PCI-DSS 4.0, GDPR data residency) make layering non-compliant. &lt;strong&gt;Systems scheduled for decommissioning&lt;/strong&gt; within 12 months don’t justify integration investment. In these cases, accelerate modernization or sunset the system entirely.&lt;/p&gt;
&lt;p&gt;For everything else, reverse modernization applies.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Reverse modernization flow showing AI agents layered first, generating ROI, then funding selective infrastructure upgrades&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 216px) 216px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;216&quot; height=&quot;622&quot; src=&quot;/_astro/reverse-modernization-flow.DBUbAqVb_FyUBb.webp&quot; srcset=&quot;/_astro/reverse-modernization-flow.DBUbAqVb_FyUBb.webp 216w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Reverse modernization flow (agents first, infrastructure later)&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-do-ai-agents-actually-integrate-with-legacy-systems&quot;&gt;How Do AI Agents Actually Integrate with Legacy Systems?&lt;/h2&gt;
&lt;p&gt;But integration is where most projects fail. Agents need access to data and business logic buried in legacy systems. You have three options. Each has tradeoffs.&lt;/p&gt;
&lt;h3 id=&quot;api-mediation-layer&quot;&gt;API Mediation Layer&lt;/h3&gt;
&lt;p&gt;Build a facade that abstracts legacy complexity. Agents interact with clean, modern interfaces while the mediation layer handles authentication, data translation (EBCDIC to UTF-8, fixed-width to JSON), and error handling. When the legacy system changes, you update the facade, not the agents. You also get a single point for logging, monitoring, and compliance audits.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; fastapi &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; FastAPI&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; pydantic &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; BaseModel&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;class&lt;/span&gt;&lt;span style=&quot;--shiki-light:#DF8E1D;--shiki-light-font-style:italic;--shiki-dark:#EED49F;--shiki-dark-font-style:italic&quot;&gt; Customer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#DF8E1D;--shiki-light-font-style:italic;--shiki-dark:#EED49F;--shiki-dark-font-style:italic&quot;&gt;BaseModel&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;):&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # Modern JSON schema&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;    id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    balance&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; float&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;@app&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;/customers/&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;{customer_id}&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; response_model&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;Customer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;async&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; get_customer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;customer_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; -&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Customer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    raw &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; legacy_client&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;call&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;CUSTINQ&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; customer_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;ljust&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; Customer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;        id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;raw&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;].&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;strip&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;        name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;raw&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;40&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;].&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;encode&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;cp037&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;).&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;decode&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;utf-8&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;),&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # EBCDIC to UTF-8  &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;        balance&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt;int&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;raw&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;40&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;52&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; /&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 100&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # Packed decimal&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    )&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;mediation/legacy_facade.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: FastAPI facade translates COBOL fixed-width records to validated JSON.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;event-driven-architecture&quot;&gt;Event-Driven Architecture&lt;/h3&gt;
&lt;p&gt;Legacy systems publish state changes through &lt;strong&gt;Dapr&lt;/strong&gt;, which supports Kafka, Azure Event Hub, and others. Agents subscribe and react in near real-time. This pattern scales better than API mediation: the system pushes updates when they matter instead of agents polling constantly. &lt;strong&gt;Dapr’s abstraction avoids vendor lock-in.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The tradeoff: you need to instrument the legacy system to publish events, which isn’t trivial if the system is old and undocumented.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; dapr&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;ext&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;grpc &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; App&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; cloudevents&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;sdk&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;event &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; json&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;app &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; App&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;@app&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;subscribe&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;pubsub_name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;legacy-events&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; topic&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;orders&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; handle_order&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;event&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-dark:#EE99A0&quot;&gt; v1&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-dark:#EE99A0&quot;&gt;Event&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; -&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; None&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    data &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; json&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;loads&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;event&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;Data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;())&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # CloudEvents envelope  &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;    # Agent processes order without polling legacy system&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    agent&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;process_order&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;order_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;],&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; data&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;customer_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;app&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;6002&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;events/order_subscriber.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: Dapr subscriber receives legacy system events via CloudEvents (swap Kafka/RabbitMQ/Azure without code changes).&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;model-context-protocol-mcp&quot;&gt;Model Context Protocol (MCP)&lt;/h3&gt;
&lt;p&gt;Anthropic’s open standard for agent-to-data connections. You write one MCP server for your legacy system, and any agent can use it. No custom integration code for each agent. This matters when coordinating multiple agents, a problem I’ve written about in &lt;a href=&quot;/posts/orchestrating-ai-agents-subagent-architecture&quot;&gt;orchestrating multiple AI agents with subagent architecture&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; fastmcp &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; FastMCP  &lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# FastMCP 3.0  &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;mcp &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; FastMCP&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;Legacy ERP&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;@mcp&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-light-font-style:italic;--shiki-dark:#F5A97F;--shiki-dark-font-style:italic&quot;&gt;tool&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; query_customer&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;customer_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; -&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &quot;&quot;&quot;Query customer from mainframe. Any MCP-compatible agent can call this.&quot;&quot;&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    result &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; mainframe_client&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;execute&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;SELECT * FROM CUSTMAST WHERE ID=&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;customer_id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;}&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; json&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;dumps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;result&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; __name__ &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;==&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;__main__&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    mcp&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;transport&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;http&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; port&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;8000&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # Remote agents connect via HTTP&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;mcp/legacy_server.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: FastMCP 3.0 server exposes legacy data to any MCP-compatible agent (one server, many agents).&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;which-pattern-should-you-choose&quot;&gt;Which Pattern Should You Choose?&lt;/h3&gt;

























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Pattern&lt;/th&gt;&lt;th&gt;Best When&lt;/th&gt;&lt;th&gt;Timeline&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;API Mediation&lt;/td&gt;&lt;td&gt;Stable APIs, 1-2 agents, tight control needed&lt;/td&gt;&lt;td&gt;4-8 weeks&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Event-Driven&lt;/td&gt;&lt;td&gt;1,000+ transactions/hour, sub-second response&lt;/td&gt;&lt;td&gt;8-12 weeks&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;MCP&lt;/td&gt;&lt;td&gt;3+ agents, standardization priority&lt;/td&gt;&lt;td&gt;6-12 weeks&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Integration pattern selection guide&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Three integration patterns: API Mediation Layer (facade pattern), Event-Driven Architecture (message bus), and Model Context Protocol (MCP servers)&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 784px) 784px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;784&quot; height=&quot;584&quot; src=&quot;/_astro/integration-patterns.wK2CV7Mk_1gHy23.webp&quot; srcset=&quot;/_astro/integration-patterns.wK2CV7Mk_1XggwJ.webp 640w, /_astro/integration-patterns.wK2CV7Mk_24Ecey.webp 750w, /_astro/integration-patterns.wK2CV7Mk_1gHy23.webp 784w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Three integration patterns for legacy systems&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;when-to-consider-platform-level-solutions&quot;&gt;When to Consider Platform-Level Solutions&lt;/h3&gt;
&lt;p&gt;Enterprise platforms like Palantir Foundry take a different approach: containerizing legacy code itself through Foundry Container Engine (FCE). This “lift and shift” strategy &lt;a href=&quot;https://blog.palantir.com/safely-modernize-legacy-systems-with-palantir-foundry-container-engine-fce-d8900464da7c&quot;&gt;runs your COBOL or Fortran logic in a modern environment&lt;/a&gt; (Palantir, 2023) without rewriting it.&lt;/p&gt;
&lt;p&gt;The tradeoff is vendor commitment. Palantir engagements typically start at $1M+/year and require dedicated integration teams. For organizations with complex, multi-system landscapes and enterprise budgets, this can make sense.&lt;/p&gt;
&lt;p&gt;The integration patterns in this post work differently. You build lightweight facades around legacy systems using open tools (FastAPI, Dapr, MCP). Implementation takes weeks, not quarters. You prove value before committing to platforms.&lt;/p&gt;
&lt;p&gt;These approaches complement each other. Demonstrate ROI with integration patterns, then justify platform investments for deeper modernization. &lt;a href=&quot;https://www.deloitte.com/us/en/Industries/government-public/perspectives/deloitte-palantir-collaboration.html&quot;&gt;Deloitte calls this “data-first ERP modernization”&lt;/a&gt; (Deloitte, 2025).&lt;/p&gt;
&lt;h2 id=&quot;why-observability-infrastructure-is-non-negotiable&quot;&gt;Why Observability Infrastructure Is Non-Negotiable&lt;/h2&gt;
&lt;p&gt;Whatever integration pattern you choose, log everything. Every integration call. Every agent decision. Every error. This isn’t optional.&lt;/p&gt;
&lt;h3 id=&quot;why-log-everything&quot;&gt;Why Log Everything?&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Compliance&lt;/strong&gt;: Auditors ask “why did your agent approve this transaction?” You need logs showing the decision path.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Debugging&lt;/strong&gt;: When an agent fails, trace which integration call failed, what data it received, and why it made the wrong decision.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improvement&lt;/strong&gt;: You can’t optimize what you don’t measure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Track three categories of metrics.&lt;/p&gt;
&lt;h3 id=&quot;integration-health-metrics&quot;&gt;Integration Health Metrics&lt;/h3&gt;
&lt;p&gt;Monitor API latency (p50, p95, p99), error rates by type, and timeout frequency—metrics that align with &lt;a href=&quot;https://opentelemetry.io/docs/specs/semconv/gen-ai/&quot;&gt;OpenTelemetry GenAI conventions&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;agent-performance-metrics&quot;&gt;Agent Performance Metrics&lt;/h3&gt;
&lt;p&gt;These require deliberate instrumentation design. &lt;strong&gt;Task completion rate&lt;/strong&gt; needs a definition of “complete” per task type (e.g., “ticket resolved without escalation”). &lt;strong&gt;User satisfaction&lt;/strong&gt; comes from thumbs up/down on responses, escalation rate, and support ticket correlation. &lt;strong&gt;Decision accuracy&lt;/strong&gt; is the hardest to measure, see below.&lt;/p&gt;
&lt;h3 id=&quot;how-do-you-measure-decision-accuracy&quot;&gt;How Do You Measure Decision Accuracy?&lt;/h3&gt;
&lt;p&gt;Ground truth is often available. The question is where to find it.&lt;/p&gt;
&lt;p&gt;For &lt;strong&gt;RAG systems&lt;/strong&gt;, use &lt;a href=&quot;/posts/rag-legacy-systems/#what-is-the-overall-failure-rate&quot;&gt;categorized query benchmarks with validation subsets&lt;/a&gt;. Test accuracy by query type (conceptual, procedural, error lookup, multi-hop) since each fails differently.&lt;/p&gt;
&lt;p&gt;For &lt;strong&gt;approval workflows&lt;/strong&gt;, compare agent decisions against eventual outcomes. Was the approved invoice paid? Was the flagged transaction actually fraudulent? The business process itself provides ground truth.&lt;/p&gt;
&lt;p&gt;When &lt;strong&gt;ground truth is unavailable&lt;/strong&gt;, sample decisions for human or AI-assisted review. The question is: how many?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sample 300-400 decisions monthly.&lt;/strong&gt; This achieves +/-5% margin of error at 95% confidence &lt;a href=&quot;https://en.wikipedia.org/wiki/Sample_size_determination#Estimation_of_a_proportion&quot;&gt;regardless of total volume&lt;/a&gt; (Cochran, 1977). The math: &lt;code&gt;n = (1.96² x 0.5 x 0.5) / 0.05² = 385&lt;/code&gt;. For systems under 500 decisions/month, review all or accept wider uncertainty.&lt;/p&gt;
&lt;h3 id=&quot;business-impact-metrics&quot;&gt;Business Impact Metrics&lt;/h3&gt;
&lt;p&gt;Calculate these monthly against pre-deployment baselines. These are not real-time dashboard metrics:&lt;/p&gt;





















&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Formula&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Response time reduction&lt;/td&gt;&lt;td&gt;Agent-handled avg vs. pre-deployment baseline&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Throughput increase&lt;/td&gt;&lt;td&gt;Tickets/hour after vs. before deployment&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cost savings&lt;/td&gt;&lt;td&gt;(Hours saved x labor cost) - (API costs + infrastructure)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: Business impact calculation formulas&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;These metrics prove ROI and guide your next investments. For a complete observability implementation guide, see &lt;a href=&quot;/posts/ai-observability-gaps/&quot;&gt;closing the AI observability gap&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;what-roi-can-you-actually-expect&quot;&gt;What ROI Can You Actually Expect?&lt;/h2&gt;
&lt;p&gt;The numbers are compelling. Let me walk through real examples.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://newsroom.bankofamerica.com/content/newsroom/press-releases/2025/08/a-decade-of-ai-innovation--bofa-s-virtual-assistant-erica-surpas.html&quot;&gt;Bank of America’s Erica reduced IT service desk calls by 50%&lt;/a&gt; (Bank of America, 2025) across 213,000 employees. &lt;a href=&quot;https://blog.superhuman.com/ai-agent-useful-case-studies/&quot;&gt;Insurance companies using agentic AI reduced claims processing time from 9.6 days to 3.2 days&lt;/a&gt; (Superhuman, 2026), a 67% reduction.&lt;/p&gt;
&lt;p&gt;Atera’s 60% improvement in sales response times translates to faster deal closure. Armis’s 73% increase in RFP response capacity means the same team handles 73% more business. Both captured these gains without hiring.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.bcg.com/publications/2026/agentic-ai-power-core-insurance-ai-modernization&quot;&gt;BCG reports AI can reduce core insurance modernization costs by 30-50%&lt;/a&gt; (BCG, 2026). Agents pay for themselves, then fund the upgrades.&lt;/p&gt;



































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Company&lt;/th&gt;&lt;th&gt;Pattern&lt;/th&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Result&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Atera&lt;/td&gt;&lt;td&gt;API Mediation&lt;/td&gt;&lt;td&gt;Sales Response Time&lt;/td&gt;&lt;td&gt;60% reduction&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Armis&lt;/td&gt;&lt;td&gt;API Mediation&lt;/td&gt;&lt;td&gt;RFP Response Capacity&lt;/td&gt;&lt;td&gt;73% increase&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Bank of America&lt;/td&gt;&lt;td&gt;API Mediation&lt;/td&gt;&lt;td&gt;IT Service Desk Calls&lt;/td&gt;&lt;td&gt;50% reduction&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Insurance Industry&lt;/td&gt;&lt;td&gt;Event-Driven&lt;/td&gt;&lt;td&gt;Claims Processing Time&lt;/td&gt;&lt;td&gt;67% reduction (9.6 -&gt; 3.2 days)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 3: ROI examples across integration patterns (note: API Mediation dominates early wins due to faster implementation)&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-do-40-of-ai-agent-projects-still-fail&quot;&gt;Why Do 40% of AI Agent Projects Still Fail?&lt;/h2&gt;
&lt;p&gt;Projects fail when teams skip fundamentals.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unclear Business Value.&lt;/strong&gt; &lt;a href=&quot;https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027&quot;&gt;Gartner predicts over 40% of agentic AI projects will be canceled by 2027&lt;/a&gt; due to escalating costs, unclear value, and inadequate risk controls. Launching with vague goals like “improve productivity” makes it impossible to measure success. Define exact metrics before development: “Reduce invoice processing time from 8 days to 2 days while maintaining 99.5% accuracy” is measurable; “handle invoices better” is not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Security Vulnerabilities.&lt;/strong&gt; &lt;a href=&quot;https://genai.owasp.org/llmrisk/llm01-prompt-injection/&quot;&gt;Prompt injection attacks are ranked #1 in OWASP 2025 Top 10 for LLMs&lt;/a&gt; (OWASP, 2025). Treat agents as privileged service accounts with these controls:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tool allowlisting (no arbitrary network/file access)&lt;/li&gt;
&lt;li&gt;Schema validation on tool inputs/outputs&lt;/li&gt;
&lt;li&gt;Output sanitization (no untrusted content forwarded)&lt;/li&gt;
&lt;li&gt;Secrets isolation (no secrets in prompts; short-lived tokens)&lt;/li&gt;
&lt;li&gt;Rate limiting + anomaly detection&lt;/li&gt;
&lt;li&gt;Approval gates for high-impact actions&lt;/li&gt;
&lt;li&gt;Audit logs (immutable, centralized)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A compromised agent can make thousands of requests per minute. For deeper coverage, see &lt;a href=&quot;/posts/ai-supply-chain-attack-vectors&quot;&gt;AI supply chain security risks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Governance Retrofitting.&lt;/strong&gt; Adding compliance controls after deployment requires painful redesigns. Plan audit trails, role-based access, and compliance testing from the start.&lt;/p&gt;
&lt;h2 id=&quot;how-to-start-a-practical-implementation-framework&quot;&gt;How to Start: A Practical Implementation Framework&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Step 1: Set Specific Targets.&lt;/strong&gt; Pick one workflow with high volume, predictable, with success metrics, and low regulatory risk. Good first candidates: customer support routing, RFP response compilation, IT service desk triage, or invoice processing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 2: Audit Data Quality.&lt;/strong&gt; Check for duplicates, format inconsistencies, missing values, and access permissions. Fix the top three issues. Aim for 80% clean data, not perfection.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 3: Choose Your Integration Pattern.&lt;/strong&gt; API mediation for stable APIs and 1-2 agents (4-8 weeks). Event-driven for 1,000+ transactions/hour (8-12 weeks). MCP for 3+ agents or standardization priority (6-12 weeks).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 4: Build Observability from Day One.&lt;/strong&gt; Track integration health (latency, error rates), agent performance (completion rate, accuracy), and business impact (your target KPI). Set alerts for error rates above 5%, latency p95 spikes, and approval override surges.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 5: Start with Single-Agent Workflows.&lt;/strong&gt; Run your first agent in shadow mode for 2-4 weeks (longer for complex workflows). Compare agent decisions against human decisions. &lt;strong&gt;Exit criteria:&lt;/strong&gt; Switch to production when accuracy exceeds 95% (adjust based on cost-of-failure analysis). &lt;strong&gt;Production gate:&lt;/strong&gt; Error rate below 5% and manual override rate trending down. &lt;strong&gt;Expansion gate:&lt;/strong&gt; KPI sustained for 4 weeks with no Sev-1 incidents.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 6: Fund Modernization with Agent ROI.&lt;/strong&gt; Track which legacy systems create the most integration friction. If agents generate $200K annual savings, allocate 30-50% to infrastructure upgrades. This creates a self-funding cycle.&lt;/p&gt;
&lt;h2 id=&quot;why-should-your-board-fund-this-now&quot;&gt;Why Should Your Board Fund This Now?&lt;/h2&gt;
&lt;p&gt;ROI before major infrastructure investment: your board sees results next quarter, not in three years.&lt;/p&gt;
&lt;p&gt;Competitive advantage: while competitors wait for budgets, you’re winning deals with faster response times, higher throughput, and lower support costs. Risk mitigation: agents layer over existing systems with no business disruption.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Explore how &lt;a href=&quot;/posts/orchestrating-ai-agents-subagent-architecture&quot;&gt;subagent architectures can orchestrate multiple AI agents without coordination complexity&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;AutoRFP, “RFP AI Agents: Revolutionizing How Companies Win More Deals in Less Time” (2026) — &lt;a href=&quot;https://autorfp.ai/blog/rfp-ai-agents-revolutionizing-how-companies-win-more-deals-in-less-time&quot;&gt;https://autorfp.ai/blog/rfp-ai-agents-revolutionizing-how-companies-win-more-deals-in-less-time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Bank of America, “A Decade of AI Innovation: Erica Surpasses Milestones” (2025) — &lt;a href=&quot;https://newsroom.bankofamerica.com/content/newsroom/press-releases/2025/08/a-decade-of-ai-innovation--bofa-s-virtual-assistant-erica-surpas.html&quot;&gt;https://newsroom.bankofamerica.com/content/newsroom/press-releases/2025/08/a-decade-of-ai-innovation—bofa-s-virtual-assistant-erica-surpas.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;BCG, “Agentic AI Power Core Insurance AI Modernization” (2026) — &lt;a href=&quot;https://www.bcg.com/publications/2026/agentic-ai-power-core-insurance-ai-modernization&quot;&gt;https://www.bcg.com/publications/2026/agentic-ai-power-core-insurance-ai-modernization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Cochran, W.G., &lt;em&gt;Sampling Techniques&lt;/em&gt;, 3rd ed. (1977) — &lt;a href=&quot;https://en.wikipedia.org/wiki/Sample_size_determination&quot;&gt;https://en.wikipedia.org/wiki/Sample_size_determination&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Cyber Snowden, “Difference Between End of Life and Legacy Cyber Security” (2026) — &lt;a href=&quot;https://cybersnowden.com/difference-between-end-of-life-and-legacy-cyber-security/&quot;&gt;https://cybersnowden.com/difference-between-end-of-life-and-legacy-cyber-security/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Deloitte, “Deloitte &amp;#x26; Palantir: Driving Value in Enterprise Operations” (2025) — &lt;a href=&quot;https://www.deloitte.com/us/en/Industries/government-public/perspectives/deloitte-palantir-collaboration.html&quot;&gt;https://www.deloitte.com/us/en/Industries/government-public/perspectives/deloitte-palantir-collaboration.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Gartner, “Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027” (2025) — &lt;a href=&quot;https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027&quot;&gt;https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenTelemetry, “Semantic Conventions for Generative AI” (2024) — &lt;a href=&quot;https://opentelemetry.io/docs/specs/semconv/gen-ai/&quot;&gt;https://opentelemetry.io/docs/specs/semconv/gen-ai/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OWASP, “LLM01:2025 Prompt Injection” (2025) — &lt;a href=&quot;https://genai.owasp.org/llmrisk/llm01-prompt-injection/&quot;&gt;https://genai.owasp.org/llmrisk/llm01-prompt-injection/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Palantir, “Safely Modernize Legacy Systems with Palantir Foundry Container Engine (FCE)” (2023) — &lt;a href=&quot;https://blog.palantir.com/safely-modernize-legacy-systems-with-palantir-foundry-container-engine-fce-d8900464da7c&quot;&gt;https://blog.palantir.com/safely-modernize-legacy-systems-with-palantir-foundry-container-engine-fce-d8900464da7c&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Superhuman, “AI Agent Useful Case Studies” (2026) — &lt;a href=&quot;https://blog.superhuman.com/ai-agent-useful-case-studies/&quot;&gt;https://blog.superhuman.com/ai-agent-useful-case-studies/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;World Economic Forum, “AI Mid-Market Business Growth” (2026) — &lt;a href=&quot;https://www.weforum.org/stories/2026/01/ai-mid-market-business-growth/&quot;&gt;https://www.weforum.org/stories/2026/01/ai-mid-market-business-growth/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-engineering</category><category>legacy-systems</category><category>architecture</category><category>case-study</category><author>Hugues Clouâtre</author></item><item><title>AI Supply Chain Attacks: New Vectors in Your Dependencies</title><link>https://clouatre.ca/posts/ai-supply-chain-attack-vectors/</link><guid isPermaLink="true">https://clouatre.ca/posts/ai-supply-chain-attack-vectors/</guid><description>Slopsquatting: attackers register packages AI hallucinates. XZ Utils showed the stakes. A framework to assess your AI supply chain exposure.</description><pubDate>Wed, 11 Feb 2026 20:45:00 GMT</pubDate><content:encoded>&lt;p&gt;A CI pipeline trusts 400 packages. Last week, one of them laid off 75% of its engineering team. Two years ago, another nearly shipped a backdoor to every major Linux distribution. Attackers are now registering package names that only exist because an AI hallucinated them.&lt;/p&gt;
&lt;p&gt;Three incidents. Three attack vectors. One common thread: AI is reshaping software supply chain risk faster than most security programs can adapt.&lt;/p&gt;
&lt;p&gt;Most organizations scan for CVEs and maintain an SBOM. Few monitor for maintainer burnout, AI-driven revenue collapse, or packages that only exist because an LLM invented them. The following framework closes those gaps.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Three AI-driven attack vectors targeting a software dependency chain&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 769px) 769px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;769&quot; height=&quot;492&quot; src=&quot;/_astro/ai-attack-vectors.BivfKOCI_ZMuuUA.webp&quot; srcset=&quot;/_astro/ai-attack-vectors.BivfKOCI_23Tohj.webp 640w, /_astro/ai-attack-vectors.BivfKOCI_25J9Jf.webp 750w, /_astro/ai-attack-vectors.BivfKOCI_ZMuuUA.webp 769w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Three AI-driven attack vectors targeting a software dependency chain.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-are-the-three-attack-vectors&quot;&gt;What Are the Three Attack Vectors?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#vector-1-maintainer-collapse-ai-accelerated&quot;&gt;Vector 1: Maintainer Collapse (AI-Accelerated)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#vector-2-social-engineering-of-solo-maintainers&quot;&gt;Vector 2: Social Engineering of Solo Maintainers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#vector-3-slopsquatting-ai-native&quot;&gt;Vector 3: Slopsquatting (AI-Native)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-does-this-matter-at-enterprise-scale&quot;&gt;Why Does This Matter at Enterprise Scale?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-the-two-tier-ecosystem-create-different-risks&quot;&gt;How Does the Two-Tier Ecosystem Create Different Risks?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#tier-1-foundation-backed-projects&quot;&gt;Tier 1: Foundation-Backed Projects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tier-2-indie-and-vc-backed-projects&quot;&gt;Tier 2: Indie and VC-Backed Projects&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-should-organizations-assess-ai-exposure-risk&quot;&gt;How Should Organizations Assess AI Exposure Risk?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-5-signal-ai-exposure-audit&quot;&gt;The 5-Signal AI Exposure Audit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#decision-thresholds&quot;&gt;Decision Thresholds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#enterprise-application&quot;&gt;Enterprise Application&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-vibe-coding-multiply-supply-chain-risk&quot;&gt;How Does Vibe Coding Multiply Supply Chain Risk?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-can-organizations-defend-against-ai-supply-chain-attacks&quot;&gt;How Can Organizations Defend Against AI Supply Chain Attacks?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#extend-the-toolchain&quot;&gt;Extend the Toolchain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#sponsor-strategically&quot;&gt;Sponsor Strategically&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-will-relicensing-reshape-the-stack&quot;&gt;How Will Relicensing Reshape the Stack?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#before-the-next-sprint&quot;&gt;Before the Next Sprint&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-are-the-three-attack-vectors&quot;&gt;What Are the Three Attack Vectors?&lt;/h2&gt;
&lt;p&gt;AI does not just accelerate existing supply chain risks. It creates new ones.&lt;/p&gt;
&lt;h3 id=&quot;vector-1-maintainer-collapse-ai-accelerated&quot;&gt;Vector 1: Maintainer Collapse (AI-Accelerated)&lt;/h3&gt;
&lt;p&gt;On January 6, 2026, Adam Wathan attributed Tailwind’s layoffs directly to &lt;a href=&quot;https://github.com/tailwindlabs/tailwindcss.com/pull/2388#issuecomment-3717222957&quot;&gt;AI’s brutal impact on their business&lt;/a&gt;. Documentation traffic dropped 40%. Revenue collapsed 80%. Yet Tailwind CSS downloads keep climbing.&lt;/p&gt;
&lt;p&gt;The mechanism is simple: developers ask Copilot for a Tailwind grid layout. The AI generates it. No documentation visit. No discovery of Tailwind UI. No conversion. The developer gets value. The maintainer gets nothing.&lt;/p&gt;
&lt;p&gt;This is not a failing product. It is a failing business model, and not unique to Tailwind. Any project monetizing through documentation traffic faces the same exposure.&lt;/p&gt;
&lt;p&gt;Meanwhile, curl maintainer Daniel Stenberg &lt;a href=&quot;https://arstechnica.com/security/2026/01/overrun-with-ai-slop-curl-scraps-bug-bounties-to-ensure-intact-mental-health/&quot;&gt;scrapped the project’s bug bounty program&lt;/a&gt; on January 21, 2026, citing “intact mental health.” Twenty AI-generated vulnerability reports flooded HackerOne in January alone. None identified actual vulnerabilities. Researchers paste code into LLMs, submit the hallucinated analysis, then loop follow-up questions through the same models. The result: maintainers spend hours triaging garbage instead of shipping code.&lt;/p&gt;
&lt;p&gt;Different mechanism, same outcome. Two forces compete: AI lowers the cost of writing software (beneficial for maintainers), but it also diverts users away from the direct engagement that funds maintainers (unsustainable for business models). Koren et al. (2026) model this formally and show the demand-diversion channel dominates. Stack Overflow activity has declined roughly 25% since ChatGPT’s launch, following the same pattern as Tailwind.&lt;/p&gt;
&lt;h3 id=&quot;vector-2-social-engineering-of-solo-maintainers&quot;&gt;Vector 2: Social Engineering of Solo Maintainers&lt;/h3&gt;
&lt;p&gt;In March 2024, Microsoft engineer Andres Freund &lt;a href=&quot;https://www.crowdstrike.com/en-us/blog/cve-2024-3094-xz-upstream-supply-chain-attack/&quot;&gt;discovered a backdoor in XZ Utils&lt;/a&gt; days before it would have shipped to most Linux distributions. CVE-2024-3094 scored a perfect 10.0. The backdoor enabled remote code execution through SSH on affected systems.&lt;/p&gt;
&lt;p&gt;The attack took two years. A contributor using the name “Jia Tan” gained the sole maintainer’s trust through legitimate contributions, then inserted malicious code. The maintainer was burned out, working alone, grateful for help.&lt;/p&gt;
&lt;p&gt;This pattern repeats. The event-stream incident in 2018 followed the same playbook: abandoned maintainer transfers control, attacker inserts cryptocurrency-stealing code. XZ Utils proved the technique works at infrastructure scale.&lt;/p&gt;
&lt;p&gt;Two years later, the technique has evolved. Attackers now use LLMs to maintain technically helpful, perfectly patient personas over months, bypassing the “vibe check” that once caught human bad actors.&lt;/p&gt;
&lt;h3 id=&quot;vector-3-slopsquatting-ai-native&quot;&gt;Vector 3: Slopsquatting (AI-Native)&lt;/h3&gt;
&lt;p&gt;This attack vector did not exist before LLMs. &lt;a href=&quot;https://snyk.io/articles/slopsquatting-mitigation-strategies/&quot;&gt;Slopsquatting&lt;/a&gt; exploits models that confidently recommend nonexistent packages. One in five AI suggestions points to a package that was never published (Spracklen et al., 2025).&lt;/p&gt;
&lt;p&gt;The attack flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Researchers run popular LLMs and collect hallucinated package names&lt;/li&gt;
&lt;li&gt;Attackers register those names on npm, PyPI, or RubyGems with malicious payloads&lt;/li&gt;
&lt;li&gt;Developers install AI-suggested packages without validation&lt;/li&gt;
&lt;li&gt;Malicious code executes&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Unlike typosquatting, attackers do not need to guess which names developers might mistype. The model identifies exactly which fake packages to register. Names like “aws-helper-sdk” and “fastapi-middleware” appear in AI-generated code but never existed until attackers registered them.&lt;/p&gt;
&lt;p&gt;The defense is straightforward: verify that packages existed before the commit date. This check catches AI-hallucinated packages that attackers registered after the model suggested them:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; requests&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; datetime &lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; datetime&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; check_package_age&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt; commit_date&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; str&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; -&gt;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-light-font-style:italic;--shiki-dark:#C6A0F6;--shiki-dark-font-style:italic&quot;&gt; bool&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    resp &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; requests&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;get&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;https://registry.npmjs.org/&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;}&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; resp&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;status_code &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;==&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 404&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;        return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; False&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;  # Package does not exist&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    pkg &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; resp&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;json&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;    published &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; datetime&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;fromisoformat&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#E64553;--shiki-light-font-style:italic;--shiki-dark:#EE99A0;--shiki-dark-font-style:italic&quot;&gt;pkg&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-light-font-style:italic;--shiki-dark:#A6DA95;--shiki-dark-font-style:italic&quot;&gt;time&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;][&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;created&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;].&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;replace&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;Z&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &apos;+00:00&apos;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; published &lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;&amp;#x3C;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; datetime&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;fromisoformat&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;commit_date&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;scripts/validate_deps.py&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: Detect slopsquatting by checking package age. Conceptual implementation only.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-does-this-matter-at-enterprise-scale&quot;&gt;Why Does This Matter at Enterprise Scale?&lt;/h2&gt;
&lt;p&gt;Sonatype’s 2024 State of the Software Supply Chain report documented &lt;a href=&quot;https://www.infosecurity-magazine.com/news/156-increase-in-oss-malicious/&quot;&gt;512,847 malicious packages&lt;/a&gt; discovered between November 2023 and November 2024. A 156% year-over-year increase. The Verizon 2025 DBIR found that &lt;a href=&quot;https://deepstrike.io/blog/supply-chain-attack-statistics-2025&quot;&gt;30% of breaches now involve third-party components&lt;/a&gt;, double the previous year.&lt;/p&gt;
&lt;p&gt;Log4Shell proved how fast these risks materialize. According to Wiz and EY, &lt;a href=&quot;https://en.wikipedia.org/wiki/Log4Shell&quot;&gt;93% of enterprise cloud environments&lt;/a&gt; were affected. Four years later, 13% of Log4j downloads from Maven Central still contain the vulnerable version, roughly 40 million vulnerable downloads per year.&lt;/p&gt;
&lt;p&gt;These vectors do not operate in isolation. They cascade. The same “software-begets-software” feedback loop that drove OSS growth now amplifies contraction: fewer maintainers produce fewer packages, which reduces ecosystem quality, which further weakens incentives to share. At 70% vibe coding adoption, engagement-based monetization drops roughly 70%, but OSS entry can only sustain an 11% decline before projects start disappearing. That 59-percentage-point gap is the crisis window.&lt;/p&gt;
&lt;p&gt;Monitoring the right signals matters more than counting CVEs.&lt;/p&gt;
&lt;h2 id=&quot;how-does-the-two-tier-ecosystem-create-different-risks&quot;&gt;How Does the Two-Tier Ecosystem Create Different Risks?&lt;/h2&gt;
&lt;p&gt;Not all dependencies face equal exposure. A two-tier structure is emerging.&lt;/p&gt;
&lt;h3 id=&quot;tier-1-foundation-backed-projects&quot;&gt;Tier 1: Foundation-Backed Projects&lt;/h3&gt;
&lt;p&gt;CNCF, Apache, and Linux Foundation projects operate differently. Kubernetes maintainers are typically employed by member companies. Corporate membership dues fund development. Governance structures distribute responsibility.&lt;/p&gt;
&lt;p&gt;These projects face sustainability challenges: burnout, security maintenance burdens, contributor fatigue. Foundation backing buffers against documentation-traffic collapse, but not full immunity. Corporate membership dues often correlate with ecosystem health; if the broader ecosystem contracts, so does corporate willingness to fund.&lt;/p&gt;
&lt;h3 id=&quot;tier-2-indie-and-vc-backed-projects&quot;&gt;Tier 2: Indie and VC-Backed Projects&lt;/h3&gt;
&lt;p&gt;Tailwind, Bun (pre-acquisition), curl, and thousands of smaller projects depend on sponsorships, consulting revenue, or VC runway. Many have single maintainers. The bus factor is often one.&lt;/p&gt;
&lt;p&gt;These projects underpin the modern web. They are also most exposed to all three attack vectors.&lt;/p&gt;
&lt;p&gt;Most enterprise stacks span both tiers. Kubernetes (Tier 1) might orchestrate containers running applications built with Tailwind (Tier 2). The risk profiles differ, and monitoring should reflect that.&lt;/p&gt;
&lt;h2 id=&quot;how-should-organizations-assess-ai-exposure-risk&quot;&gt;How Should Organizations Assess AI Exposure Risk?&lt;/h2&gt;
&lt;p&gt;Traditional dependency scanning catches CVEs. It does not catch maintainer burnout, revenue collapse, or AI-hallucinated packages. Additional signals are needed.&lt;/p&gt;
&lt;h3 id=&quot;the-5-signal-ai-exposure-audit&quot;&gt;The 5-Signal AI Exposure Audit&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Funding model&lt;/strong&gt;: Corporate-backed or sponsorship-dependent? Check GitHub Sponsors, Open Collective, or company backing. Sponsorship-dependent projects face higher AI exposure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Contributor count&lt;/strong&gt;: Bus factor greater than three? XZ Utils had one active contributor. Look at commit history and active contributors over the past 12 months.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Governance&lt;/strong&gt;: Foundation membership or solo maintainer? CNCF and Apache projects have succession plans. Solo projects often do not.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AI exposure score&lt;/strong&gt;: Docs-driven monetization (high exposure) or infrastructure utility (lower exposure)? UI libraries and developer tools face higher risk than compression utilities or parsers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Recent signals&lt;/strong&gt;: Layoffs, acquisition talks, burnout posts? Monitor project blogs, maintainer social media, and GitHub discussions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;decision-thresholds&quot;&gt;Decision Thresholds&lt;/h3&gt;
&lt;p&gt;The following framework maps signals to response levels:&lt;/p&gt;






























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Risk Level&lt;/th&gt;&lt;th&gt;Signals&lt;/th&gt;&lt;th&gt;Action&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Monitor&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;1-2 signals, Tier 1 project&lt;/td&gt;&lt;td&gt;Quarterly review&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Watch&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;2-3 signals, any tier&lt;/td&gt;&lt;td&gt;Monthly review, identify alternatives&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Mitigate&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;3+ signals, Tier 2 project&lt;/td&gt;&lt;td&gt;Sponsor, fork, or migrate&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Critical&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Active incidents (layoffs, security events)&lt;/td&gt;&lt;td&gt;Immediate review, contingency plan&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Decision thresholds for dependency risk response.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;enterprise-application&quot;&gt;Enterprise Application&lt;/h3&gt;
&lt;p&gt;Consider a fintech platform with 400 npm dependencies. Traditional scanning surfaces CVEs. The AI exposure audit surfaces different risks:&lt;/p&gt;



































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Dependency Type&lt;/th&gt;&lt;th&gt;Count&lt;/th&gt;&lt;th&gt;High AI Exposure&lt;/th&gt;&lt;th&gt;Action&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;UI frameworks&lt;/td&gt;&lt;td&gt;12&lt;/td&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;Review monetization models&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Build tools&lt;/td&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;Track bus factor, burnout signals&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Infrastructure&lt;/td&gt;&lt;td&gt;45&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Lower priority&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Utility libraries&lt;/td&gt;&lt;td&gt;335&lt;/td&gt;&lt;td&gt;23&lt;/td&gt;&lt;td&gt;Automate monitoring&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: AI exposure audit across dependency categories.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The 23 high-exposure utility libraries are not all equal. Prioritize by criticality: is it in the authentication path? The payment flow? The deployment pipeline?&lt;/p&gt;
&lt;h2 id=&quot;how-does-vibe-coding-multiply-supply-chain-risk&quot;&gt;How Does Vibe Coding Multiply Supply Chain Risk?&lt;/h2&gt;
&lt;p&gt;Andrej Karpathy &lt;a href=&quot;https://x.com/karpathy/status/1886192184808149383&quot;&gt;coined “vibe coding”&lt;/a&gt; in February 2025: developers who “fully give in to the vibes” and let AI generate entire applications. The practice is &lt;a href=&quot;https://www.infosecurity-magazine.com/opinions/vibe-coding-security-risk-ai/&quot;&gt;accelerating&lt;/a&gt;, and it compounds every vector above.&lt;/p&gt;
&lt;p&gt;Slopsquatting is the visible risk. The less visible one: license laundering. AI training data includes GPL and AGPL (copyleft) code. When developers accept AI-generated output without review, they risk shipping restrictive-licensed code as proprietary. The provenance is untraceable. Tools like &lt;a href=&quot;https://fossa.com/solutions/oss-license-compliance/&quot;&gt;FOSSA&lt;/a&gt; and &lt;a href=&quot;https://snyk.io/product/open-source-security-management/license-compliance/&quot;&gt;Snyk License Compliance&lt;/a&gt; can scan for license violations in CI, but they catch dependencies, not AI-generated source code. Manual review remains essential.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;jobs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  license-scan&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    runs-on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ubuntu-latest&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    steps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Checkout repository&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/checkout@v6&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Scan licenses with FOSSA&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; fossas/fossa-action@v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          api-key&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ${{ secrets.FOSSA_API_KEY }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          run-tests&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; true&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;.github/workflows/license-check.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 4: FOSSA scans dependencies for GPL/AGPL violations and fails the build on policy breach.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;AI-assisted commits warrant validation: flagging new dependencies, verifying packages existed before the commit date (Code Snippet 1), and checking maintainer history. For teams using AI agents extensively, &lt;a href=&quot;/posts/orchestrating-ai-agents-subagent-architecture/&quot;&gt;subagent architectures&lt;/a&gt; can dedicate a validation agent to check every dependency before acceptance.&lt;/p&gt;
&lt;h2 id=&quot;how-can-organizations-defend-against-ai-supply-chain-attacks&quot;&gt;How Can Organizations Defend Against AI Supply Chain Attacks?&lt;/h2&gt;
&lt;h3 id=&quot;extend-the-toolchain&quot;&gt;Extend the Toolchain&lt;/h3&gt;
&lt;p&gt;Existing tools catch CVEs. Tools that catch sustainability risks fill the gap:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenSSF Scorecard&lt;/strong&gt; (scorecard.dev) scores projects on maintainer activity, security practices, and bus factor&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;deps.dev&lt;/strong&gt; provides dependency graphs with contributor data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Socket.dev&lt;/strong&gt; detects supply chain attacks including slopsquatting patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Scorecard and Socket integrate directly into &lt;a href=&quot;/posts/ai-augmented-cicd/&quot;&gt;AI-augmented CI/CD pipelines&lt;/a&gt; via GitHub Actions, flagging risky dependencies before merge.&lt;/p&gt;
&lt;p&gt;For a quick CLI check:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Check a project&apos;s health score (0-10)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt;scorecard&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; --repo=tailwindlabs/tailwindcss&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: OpenSSF Scorecard CLI checks a repository’s security health score (0-10).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;To enforce this in CI, integrate Scorecard into a GitHub Actions workflow. The workflow below fails the build if a dependency scores below 7 out of 10, preventing risky dependencies from entering production:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Run OpenSSF Scorecard&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ossf/scorecard-action@v2&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    results_file&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; scorecard.json&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Block on low score&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    jq -e &apos;.score &gt;= 7&apos; scorecard.json || exit 1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;.github/workflows/scorecard.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: GitHub Actions workflow to enforce minimum dependency health score of 7/10 in CI.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;sponsor-strategically&quot;&gt;Sponsor Strategically&lt;/h3&gt;
&lt;p&gt;Tooling catches symptoms. Sponsorship addresses the cause. If a Tier 2 project in a critical path shows stress signals, $500/month establishes a relationship with the maintainer and early warning on sustainability issues.&lt;/p&gt;
&lt;p&gt;But individual sponsorship is a stopgap, not a solution. The systemic fix requires platform-level change. One proposed model: AI coding platforms already track which packages they import, so they could redistribute subscription revenue to maintainers based on attributed usage, a “Spotify for open source.” The infrastructure exists; the coordination does not. Until it does, direct sponsorship remains the best lever CTOs have.&lt;/p&gt;
&lt;p&gt;The companies that sponsored Log4j before Log4Shell had maintainer relationships when the crisis hit. The ones that did not scrambled with everyone else.&lt;/p&gt;
&lt;h2 id=&quot;how-will-relicensing-reshape-the-stack&quot;&gt;How Will Relicensing Reshape the Stack?&lt;/h2&gt;
&lt;p&gt;The OSS ecosystem is adapting. Bun’s &lt;a href=&quot;https://bun.sh/blog/bun-joins-anthropic&quot;&gt;acquisition by Anthropic&lt;/a&gt; shows one path: AI companies absorbing critical infrastructure. More acquisitions are likely.&lt;/p&gt;
&lt;p&gt;Experimental protocols like x402 attempt to let AI agents pay for resource access. Still early, but architecturally sound.&lt;/p&gt;
&lt;p&gt;The two-tier ecosystem is crystallizing. Foundation-backed projects will weather it better. Indie projects will face consolidation pressure, through acquisition, abandonment, new business models, or relicensing. HashiCorp moved Terraform to BSL in 2023. Redis followed months later. Sentry, MariaDB, Elastic. The pattern is clear. When sponsorships fail and AI erodes documentation revenue, restrictive licenses become the survival strategy. For enterprises, this means dependencies assumed to be permissively licensed may not stay that way.&lt;/p&gt;
&lt;h3 id=&quot;before-the-next-sprint&quot;&gt;Before the Next Sprint&lt;/h3&gt;
&lt;ul class=&quot;contains-task-list&quot;&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; Run OpenSSF Scorecard on the top 10 dependencies&lt;/li&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; Check bus factor on Tier 2 critical-path projects&lt;/li&gt;
&lt;li class=&quot;task-list-item&quot;&gt;&lt;input type=&quot;checkbox&quot; disabled&gt; Add slopsquatting validation to CI (Code Snippet 1)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI adoption is accelerating. So are the attacks exploiting it. The question is not whether a supply chain has risk, but whether teams that see it first have the advantage.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CrowdStrike, “CVE-2024-3094 and XZ Upstream Supply Chain Attack” (2024) - &lt;a href=&quot;https://www.crowdstrike.com/en-us/blog/cve-2024-3094-xz-upstream-supply-chain-attack/&quot;&gt;https://www.crowdstrike.com/en-us/blog/cve-2024-3094-xz-upstream-supply-chain-attack/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Goodin, Dan, “Overrun with AI slop, cURL scraps bug bounties” (Ars Technica, 2026) - &lt;a href=&quot;https://arstechnica.com/security/2026/01/overrun-with-ai-slop-curl-scraps-bug-bounties-to-ensure-intact-mental-health/&quot;&gt;https://arstechnica.com/security/2026/01/overrun-with-ai-slop-curl-scraps-bug-bounties-to-ensure-intact-mental-health/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Infosecurity Magazine, “Vibe Coding: A Hidden Security Risk of the AI Era” (2025) - &lt;a href=&quot;https://www.infosecurity-magazine.com/opinions/vibe-coding-security-risk-ai/&quot;&gt;https://www.infosecurity-magazine.com/opinions/vibe-coding-security-risk-ai/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Karpathy, Andrej, “vibe coding” (X, 2025) - &lt;a href=&quot;https://x.com/karpathy/status/1886192184808149383&quot;&gt;https://x.com/karpathy/status/1886192184808149383&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Koren, Miklos et al., “Vibe Coding Kills Open Source” (arXiv, 2026) - &lt;a href=&quot;https://arxiv.org/abs/2601.15494&quot;&gt;https://arxiv.org/abs/2601.15494&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Snyk, “Slopsquatting: New AI Hallucination Threats” (2025) - &lt;a href=&quot;https://snyk.io/articles/slopsquatting-mitigation-strategies/&quot;&gt;https://snyk.io/articles/slopsquatting-mitigation-strategies/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sonatype, “10th Annual State of the Software Supply Chain” (2024) - &lt;a href=&quot;https://www.sonatype.com/state-of-the-software-supply-chain/introduction&quot;&gt;https://www.sonatype.com/state-of-the-software-supply-chain/introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Spracklen, Joseph et al., “We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs” (arXiv, 2025) - &lt;a href=&quot;https://arxiv.org/abs/2406.10279&quot;&gt;https://arxiv.org/abs/2406.10279&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sumner, Jarred, “Bun is joining Anthropic” (2025) - &lt;a href=&quot;https://bun.sh/blog/bun-joins-anthropic&quot;&gt;https://bun.sh/blog/bun-joins-anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Verizon, “2025 Data Breach Investigations Report” - &lt;a href=&quot;https://www.verizon.com/business/resources/reports/dbir/&quot;&gt;https://www.verizon.com/business/resources/reports/dbir/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Wathan, Adam, “GitHub comment on Tailwind layoffs” (2026) - &lt;a href=&quot;https://github.com/tailwindlabs/tailwindcss.com/pull/2388#issuecomment-3717222957&quot;&gt;https://github.com/tailwindlabs/tailwindcss.com/pull/2388#issuecomment-3717222957&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>security</category><category>threat-analysis</category><author>Hugues Clouâtre</author></item><item><title>Orchestrating AI Agents: A Subagent Architecture for Code</title><link>https://clouatre.ca/posts/orchestrating-ai-agents-subagent-architecture/</link><guid isPermaLink="true">https://clouatre.ca/posts/orchestrating-ai-agents-subagent-architecture/</guid><description>50% cost reduction with subagent architecture for AI coding. Capable models for planning, fast models for building. Real metrics from Goose.</description><pubDate>Tue, 24 Feb 2026 12:46:00 GMT</pubDate><content:encoded>&lt;p&gt;Single-agent AI coding hits a ceiling. Context windows fill up. Role confusion creeps in. Output quality degrades. The solution: multiple specialized models with structured handoffs. One model plans, another builds, a third validates. Each starts fresh. Each excels at its role.&lt;/p&gt;
&lt;p&gt;Basic code assistants show roughly 10% productivity gains. But companies pairing AI with end-to-end process transformation report &lt;a href=&quot;https://www.bain.com/insights/from-pilots-to-payoff-generative-ai-in-software-development-technology-report-2025/&quot;&gt;25-30% improvements&lt;/a&gt; (Bain, 2025). The difference isn’t the model. It’s the architecture, specifically how you engineer the context each agent receives.&lt;/p&gt;
&lt;p&gt;Anthropic’s research on multi-agent systems confirms what we observe: architecture matters more than model choice. Their finding that &lt;a href=&quot;https://www.anthropic.com/engineering/multi-agent-research-system&quot;&gt;“token usage explains 80% of the variance”&lt;/a&gt; reflects the impact of isolation: focused context rather than accumulated conversation history.&lt;/p&gt;
&lt;p&gt;This post documents a production workflow using &lt;a href=&quot;https://github.com/block/goose&quot;&gt;Goose&lt;/a&gt;, an open-source AI assistant. The architecture separates planning, building, and validation into distinct phases, each with a different model optimized for the task.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#why-do-single-agent-ai-coding-workflows-hit-a-ceiling&quot;&gt;Why Do Single-Agent AI Coding Workflows Hit a Ceiling?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#context-rot&quot;&gt;Context Rot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#role-confusion&quot;&gt;Role Confusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#accumulated-errors&quot;&gt;Accumulated Errors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-subagent-architecture-solve-context-problems&quot;&gt;How Does Subagent Architecture Solve Context Problems?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-model-selection-affect-cost-and-quality&quot;&gt;How Does Model Selection Affect Cost and Quality?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#cost-optimization&quot;&gt;Cost Optimization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-minimalist-instructions-matter&quot;&gt;Why Minimalist Instructions Matter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-project-context-reach-subagents&quot;&gt;How Does Project Context Reach Subagents?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-do-subagents-communicate&quot;&gt;How Do Subagents Communicate?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#where-should-human-judgment-stay-in-ai-workflows&quot;&gt;Where Should Human Judgment Stay in AI Workflows?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-results-does-this-produce&quot;&gt;What Results Does This Produce?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#design-targets&quot;&gt;Design Targets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-does-this-work-and-when-doesnt-it&quot;&gt;When Does This Work (and When Doesn’t It)?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#takeaways&quot;&gt;Takeaways&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;why-do-single-agent-ai-coding-workflows-hit-a-ceiling&quot;&gt;Why Do Single-Agent AI Coding Workflows Hit a Ceiling?&lt;/h2&gt;
&lt;p&gt;A single AI model handling an entire coding task accumulates context with every interaction. By implementation time, the model carries baggage from analysis, research, and planning phases. This stems from three core problems.&lt;/p&gt;
&lt;h3 id=&quot;context-rot&quot;&gt;Context Rot&lt;/h3&gt;
&lt;p&gt;Long conversations consume token budgets. The model forgets early instructions or weighs recent context too heavily. Chroma Research calls this &lt;a href=&quot;https://research.trychroma.com/context-rot&quot;&gt;context rot: performance degrades consistently as input tokens increase&lt;/a&gt;, even on simple tasks, and worsens for multi-step reasoning like coding (Chroma Research, 2025). On-demand retrieval adds another failure mode: agents miss context 56% of the time because they don’t recognize when to fetch it (Gao, 2026).&lt;/p&gt;
&lt;h3 id=&quot;role-confusion&quot;&gt;Role Confusion&lt;/h3&gt;
&lt;p&gt;A model asked to analyze, plan, implement, and validate lacks clear boundaries. It starts implementing during planning. It skips validation steps. Outputs blur together.&lt;/p&gt;
&lt;h3 id=&quot;accumulated-errors&quot;&gt;Accumulated Errors&lt;/h3&gt;
&lt;p&gt;Mistakes in early phases propagate. A misunderstanding in analysis leads to a flawed plan. A flawed plan leads to incorrect implementation. Fixing requires starting over.&lt;/p&gt;
&lt;h2 id=&quot;how-does-subagent-architecture-solve-context-problems&quot;&gt;How Does Subagent Architecture Solve Context Problems?&lt;/h2&gt;
&lt;p&gt;The fix: spawn specialized subagents for each phase. An orchestrator handles high-level coordination and human interaction. Subagents handle execution with fresh context.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Subagent workflow diagram showing Orchestrator with RESEARCH, PLAN phases flowing to Builder and Validator subagents&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 260px) 260px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;260&quot; height=&quot;1066&quot; src=&quot;/_astro/subagent-workflow.mAYdpe6d_JYXoS.webp&quot; srcset=&quot;/_astro/subagent-workflow.mAYdpe6d_JYXoS.webp 260w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Core subagent workflow. Orchestrator handles RESEARCH (with human gate) and PLAN. Builder and Validator run as separate subagents with fresh context. SETUP and COMMIT/PR phases omitted for clarity.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The orchestrator (Claude Opus 4.5) handles RESEARCH and PLAN phases. RESEARCH requires human judgment at a single gate to decide the approach. After plan completion, it spawns a BUILD subagent (Claude Haiku 4.5) that receives only the plan, not accumulated history. The builder writes code, runs tests, then hands off to a CHECK subagent (Claude Sonnet 4.5) for validation.&lt;/p&gt;
&lt;p&gt;Each subagent starts with clean context. The builder knows what to build, not how we decided to build it. The validator knows what was built, not what alternatives we considered. This is context engineering in practice: designing what each agent sees rather than letting context accumulate. This isolation directly counteracts context rot and drives the performance gains research attributes to architecture over model selection (Anthropic Engineering, 2025).&lt;/p&gt;
&lt;h2 id=&quot;how-does-model-selection-affect-cost-and-quality&quot;&gt;How Does Model Selection Affect Cost and Quality?&lt;/h2&gt;
&lt;p&gt;Different phases need different capabilities. Planning requires reasoning. Building requires speed and instruction-following. Validation requires balanced judgment.&lt;/p&gt;





























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Model&lt;/th&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Temperature&lt;/th&gt;&lt;th&gt;Rationale&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Opus&lt;/td&gt;&lt;td&gt;Orchestrator&lt;/td&gt;&lt;td&gt;0.5&lt;/td&gt;&lt;td&gt;High reasoning for research and planning&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Haiku&lt;/td&gt;&lt;td&gt;Builder&lt;/td&gt;&lt;td&gt;0.2&lt;/td&gt;&lt;td&gt;Fast, cheap, precise instruction-following&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sonnet&lt;/td&gt;&lt;td&gt;Validator&lt;/td&gt;&lt;td&gt;0.1&lt;/td&gt;&lt;td&gt;Balanced judgment, conservative (catches issues)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Model selection by phase. Temperature decreases as tasks become more deterministic.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;cost-optimization&quot;&gt;Cost Optimization&lt;/h3&gt;
&lt;p&gt;Building involves the most token-heavy work: reading files, writing code, running tests. Routing this volume to cheaper models cuts costs significantly.&lt;/p&gt;





























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Model&lt;/th&gt;&lt;th&gt;Input&lt;/th&gt;&lt;th&gt;Output&lt;/th&gt;&lt;th&gt;Role in Workflow&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Opus&lt;/td&gt;&lt;td&gt;$5/MTok&lt;/td&gt;&lt;td&gt;$25/MTok&lt;/td&gt;&lt;td&gt;Planning (~20% of tokens)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sonnet&lt;/td&gt;&lt;td&gt;$3/MTok&lt;/td&gt;&lt;td&gt;$15/MTok&lt;/td&gt;&lt;td&gt;Validation (~20% of tokens)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Haiku&lt;/td&gt;&lt;td&gt;$1/MTok&lt;/td&gt;&lt;td&gt;$5/MTok&lt;/td&gt;&lt;td&gt;Building (~60% of tokens)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: Anthropic API pricing, December 2025. Building consumes the most tokens at the lowest cost.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Research on multi-agent LLM systems shows up to 94% cost reduction through model cascading (Gandhi et al., 2025). This architecture targets 50-60% savings by routing building work to Haiku while preserving Opus for planning.&lt;/p&gt;
&lt;p&gt;Beyond cost, fresh context enables tasks that fail with single agents. A 12-file refactor that exhausts a single model’s context window succeeds when each subagent starts clean.&lt;/p&gt;
&lt;h3 id=&quot;why-minimalist-instructions-matter&quot;&gt;Why Minimalist Instructions Matter&lt;/h3&gt;
&lt;p&gt;Smaller models like Haiku excel with focused, explicit prompts. Complex multi-step instructions cause drift. The recipe went through multiple iterations to find the right balance: enough context to execute correctly, minimal enough to avoid confusion. Each phase prompt fits in under 500 tokens. The builder receives a structured JSON plan, not prose. Constraints beat verbosity.&lt;/p&gt;
&lt;h3 id=&quot;how-does-project-context-reach-subagents&quot;&gt;How Does Project Context Reach Subagents?&lt;/h3&gt;
&lt;p&gt;Recipes define the workflow, but subagents also need project context: build commands, conventions, file structure. That’s where &lt;a href=&quot;https://agents-md.org/&quot;&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/a&gt; comes in, a portable markdown file that provides the baseline knowledge every subagent inherits. Goose, Claude Code, Cursor, Codex, and &lt;a href=&quot;https://agents-md.org/&quot;&gt;40+ other tools&lt;/a&gt; read it natively. In &lt;a href=&quot;https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals&quot;&gt;Vercel’s evals&lt;/a&gt; (Gao, 2026), an &lt;code&gt;AGENTS.md&lt;/code&gt; file achieved a 100% pass rate on build, lint, and test tasks where skills-based approaches maxed out at 79%.&lt;/p&gt;
&lt;p&gt;Think of it as CSS for agents: global rules cascade into every project, project-specific rules override where needed. The orchestrator and every subagent it spawns inherit both layers without explicit prompting.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;markdown&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;## Commits&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Conventional commits, GPG signed and DCO sign-off&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Feature branches only, PRs for everything&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Never merge without explicit user request&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;## Security&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Treat all repositories as public&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; No secrets, API keys, credentials, or PII&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;~/.config/goose/AGENTS.md&lt;/span&gt;&lt;/pre&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;markdown&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;## Stack&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;Rust 2024 + Tokio + Clap (derive) + Octocrab&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;## Project-Specific Patterns&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; Apache-2.0 license with SPDX headers&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt; cargo-deny for dependency audits&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;aptu/AGENTS.md&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: Global and project-level AGENTS.md files. The builder subagent from Table 3 inherits both layers: it knows to GPG-sign commits (global) and use cargo-deny (project) without either appearing in the handoff JSON.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-do-subagents-communicate&quot;&gt;How Do Subagents Communicate?&lt;/h2&gt;
&lt;p&gt;Subagents communicate through JSON files in &lt;code&gt;$WORKTREE/.handoff/&lt;/code&gt;. Each session uses an isolated git worktree, so handoff files are scoped to that execution context. This creates an explicit contract between phases.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;text&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;$WORKTREE/.handoff/&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;├── 02-plan.json      # Orchestrator → Builder&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;├── 03-build.json     # Builder → Validator  &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;└── 04-validation.json # Validator → Builder (on failure)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: Handoff directory structure showing the JSON files that pass context between phases.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The plan file contains everything the builder needs:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;json&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;{&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;overview&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Remove 4 dead render_with_context methods&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;files&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    {&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;src/output/triage.rs&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;action&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;modify&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;},&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    {&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;src/output/history.rs&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;action&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;modify&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;},&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    {&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;src/output/bulk.rs&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;action&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;modify&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;},&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    {&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;src/output/create.rs&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;action&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;modify&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  ],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;steps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &quot;Remove render_with_context impl blocks from each file&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &quot;Remove #[allow(dead_code)] annotations&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &quot;Remove unused imports&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;    &quot;Run cargo fmt &amp;#x26;&amp;#x26; cargo clippy &amp;#x26;&amp;#x26; cargo test&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  ],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;risks&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;None - confirmed dead code&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;.handoff/02-plan.json&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: Plan handoff file with structured task definition for the builder subagent.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The validator reads both &lt;code&gt;02-plan.json&lt;/code&gt; and &lt;code&gt;03-build.json&lt;/code&gt; to verify implementation matches requirements. It writes structured feedback to &lt;code&gt;04-validation.json&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;json&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;{&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;verdict&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;FAIL&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;checks&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;    {&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Remove #[allow(dead_code)] annotations&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;status&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;FAIL&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;     &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;notes&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Annotations still present in history.rs:145, bulk.rs:31, create.rs:63&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  ],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;issues&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;Plan required removing annotations, but these are still present&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;],&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;  &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;next_steps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Fix issue: Remove the three annotations, then re-validate&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;.handoff/04-validation.json&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 4: Validation handoff file with actionable feedback for the builder to address.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The builder reads this feedback, fixes the specific issues, and triggers another CHECK cycle until validation passes.&lt;/p&gt;
&lt;p&gt;Why files instead of memory? Three reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Auditable.&lt;/strong&gt; Every decision is recorded. Debug failures by reading the handoff chain.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resumable.&lt;/strong&gt; Interrupt and resume without losing state. Start a new session with the same handoff files and no work is lost.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Debuggable.&lt;/strong&gt; Failed validations include exact locations and actionable next steps.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;where-should-human-judgment-stay-in-ai-workflows&quot;&gt;Where Should Human Judgment Stay in AI Workflows?&lt;/h2&gt;
&lt;p&gt;Not every phase needs human approval. The workflow distinguishes between decisions (require judgment) and execution (follow the plan).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phases with gates (human approval required):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RESEARCH: “Which of these approaches should we take?”&lt;/li&gt;
&lt;li&gt;CHECK (conditional): On FAIL or PASS WITH NOTES, human decides whether to fix issues or proceed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Phases without gates (auto-proceed):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SETUP: Initialize context and gather requirements&lt;/li&gt;
&lt;li&gt;PLAN: Design solution based on approved research direction&lt;/li&gt;
&lt;li&gt;BUILD: Execute the approved plan&lt;/li&gt;
&lt;li&gt;CHECK (on PASS): Validation passed, proceed to commit&lt;/li&gt;
&lt;li&gt;COMMIT/PR: Push validated changes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This separation preserves governance without creating bottlenecks. Humans make strategic decisions. AI executes. If validation fails, the system loops back to BUILD with specific feedback. No human intervention for mechanical fixes.&lt;/p&gt;
&lt;h2 id=&quot;what-results-does-this-produce&quot;&gt;What Results Does This Produce?&lt;/h2&gt;
&lt;p&gt;This architecture powers development across multiple projects. Three examples from &lt;a href=&quot;https://github.com/clouatre-labs/aptu&quot;&gt;aptu&lt;/a&gt;:&lt;/p&gt;

























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;PR&lt;/th&gt;&lt;th&gt;Scope&lt;/th&gt;&lt;th&gt;Files Changed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;https://github.com/clouatre-labs/aptu/pull/272&quot;&gt;#272&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Consolidate 4 clients → 1 generic&lt;/td&gt;&lt;td&gt;9 files&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;https://github.com/clouatre-labs/aptu/pull/256&quot;&gt;#256&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Add Groq + Cerebras providers&lt;/td&gt;&lt;td&gt;9 files&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;https://github.com/clouatre-labs/aptu/pull/244&quot;&gt;#244&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Extract shared AiProvider trait&lt;/td&gt;&lt;td&gt;9 files&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 3: Representative PRs using subagent architecture. All passed CI, all merged without rework.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The validation phase caught issues the builder missed. In PR #272, the CHECK subagent identified a missing trait bound that would have failed compilation. The builder fixed it on the retry loop. No human intervention required.&lt;/p&gt;
&lt;h3 id=&quot;design-targets&quot;&gt;Design Targets&lt;/h3&gt;
&lt;p&gt;Research on multi-agent frameworks for code generation shows they &lt;a href=&quot;https://arxiv.org/abs/2510.08804&quot;&gt;consistently outperform single-model systems&lt;/a&gt; (Raghavan &amp;#x26; Mallick, 2025). The architecture is designed to achieve:&lt;/p&gt;

























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Single Agent&lt;/th&gt;&lt;th&gt;Subagent Architecture&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Context at build phase&lt;/td&gt;&lt;td&gt;~50K tokens&lt;/td&gt;&lt;td&gt;~5K tokens (fresh)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Rework loops&lt;/td&gt;&lt;td&gt;2-3 typical&lt;/td&gt;&lt;td&gt;0-1 expected&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Human interventions&lt;/td&gt;&lt;td&gt;Throughout&lt;/td&gt;&lt;td&gt;Only at gates&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 4: Design targets based on context isolation and structured handoffs.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;when-does-this-work-and-when-doesnt-it&quot;&gt;When Does This Work (and When Doesn’t It)?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Works well for:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-file refactors where context isolation prevents confusion&lt;/li&gt;
&lt;li&gt;Feature additions following established patterns&lt;/li&gt;
&lt;li&gt;Complex changes requiring distinct planning and execution&lt;/li&gt;
&lt;li&gt;Teams wanting audit trails (handoff files document decisions)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Less effective for:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simple one-file fixes (overhead exceeds benefit)&lt;/li&gt;
&lt;li&gt;Legacy systems without clear patterns (builder lacks context)&lt;/li&gt;
&lt;li&gt;Exploratory work where plans change during implementation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For teams integrating AI agents with legacy systems, see &lt;a href=&quot;/posts/ai-agents-legacy-roi&quot;&gt;AI agents in legacy environments&lt;/a&gt; for integration patterns that work when your data lives in mainframes and AS400 systems.&lt;/p&gt;
&lt;h2 id=&quot;takeaways&quot;&gt;Takeaways&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Separate reasoning from execution.&lt;/strong&gt; Use capable models for planning, fast models for building.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fresh context beats accumulated context.&lt;/strong&gt; Subagents start clean. They follow instructions without historical baggage.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Structured handoffs create audit trails.&lt;/strong&gt; JSON files document what was planned, built, and validated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gates at decisions, not execution.&lt;/strong&gt; Human judgment for strategy. Automated loops for implementation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The full recipe is available as a &lt;a href=&quot;https://gist.github.com/clouatre/22d4451725f3c64dabe680297bbd35d7&quot;&gt;GitHub Gist&lt;/a&gt;. It builds on patterns from &lt;a href=&quot;/posts/ai-assisted-development-judgment-over-implementation/&quot;&gt;AI-Assisted Development: From Implementation to Judgment&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic Engineering, “How we built our multi-agent research system” (2025) — &lt;a href=&quot;https://www.anthropic.com/engineering/multi-agent-research-system&quot;&gt;https://www.anthropic.com/engineering/multi-agent-research-system&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Bain &amp;#x26; Company, “From Pilots to Payoff: Generative AI in Software Development” (2025) — &lt;a href=&quot;https://www.bain.com/insights/from-pilots-to-payoff-generative-ai-in-software-development-technology-report-2025/&quot;&gt;https://www.bain.com/insights/from-pilots-to-payoff-generative-ai-in-software-development-technology-report-2025/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Chroma Research, “Context Rot: How Increasing Input Tokens Impacts LLM Performance” (2025) — &lt;a href=&quot;https://research.trychroma.com/context-rot&quot;&gt;https://research.trychroma.com/context-rot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Gao, Jude, “AGENTS.md outperforms skills in our agent evals” (2026) — &lt;a href=&quot;https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals&quot;&gt;https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Gandhi et al., “BudgetMLAgent: A Cost-Effective LLM Multi-Agent System” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2411.07464&quot;&gt;https://arxiv.org/abs/2411.07464&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Raghavan &amp;#x26; Mallick, “MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2510.08804&quot;&gt;https://arxiv.org/abs/2510.08804&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-engineering</category><category>architecture</category><category>goose</category><category>implementation-guide</category><author>Hugues Clouâtre</author></item><item><title>AI-Augmented CI/CD - Shift Left Security Without the Risk</title><link>https://clouatre.ca/posts/ai-augmented-cicd/</link><guid isPermaLink="true">https://clouatre.ca/posts/ai-augmented-cicd/</guid><description>AI code review in CI/CD without prompt injection. Defensive patterns: three security tiers, isolated execution, no secrets in prompts.</description><pubDate>Fri, 06 Feb 2026 09:01:00 GMT</pubDate><content:encoded>&lt;p&gt;Code reviews are a bottleneck. Engineering teams lose measurable velocity waiting for feedback. This delay compounds when security vulnerabilities escalate. Fixing a defect in production costs 30-100× more than fixing it during design (Boehm &amp;#x26; Basili, 2001). The economics are clear: early detection reduces downstream costs exponentially.&lt;/p&gt;
&lt;p&gt;AI in CI/CD augments human review by analyzing code patterns and tool outputs before human reviewers see the changes. Analysis completes in 2-7 seconds compared to 4-22 hour human review cycles.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-the-real-cost-of-manual-code-review&quot;&gt;What Is the Real Cost of Manual Code Review?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-review-bottleneck&quot;&gt;The Review Bottleneck&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-security-cost-multiplier&quot;&gt;The Security Cost Multiplier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-prompt-injection-risk&quot;&gt;The Prompt Injection Risk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-ai-integration-work-without-prompt-injection-risk&quot;&gt;How Does AI Integration Work Without Prompt Injection Risk?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tier-2-and-tier-3-speed-vs-security-trade-offs&quot;&gt;Tier 2 and Tier 3: Speed vs. Security Trade-offs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#evolution-from-uncontrolled-to-managed-ai-analysis&quot;&gt;Evolution From Uncontrolled to Managed AI Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-outcomes-does-ai-augmented-cicd-deliver&quot;&gt;What Outcomes Does AI-Augmented CI/CD Deliver?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#implementation-guide&quot;&gt;Implementation Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-is-the-real-cost-of-manual-code-review&quot;&gt;What Is the Real Cost of Manual Code Review?&lt;/h2&gt;
&lt;h3 id=&quot;the-review-bottleneck&quot;&gt;The Review Bottleneck&lt;/h3&gt;
&lt;p&gt;Development velocity correlates with code review latency. Code review bottlenecks are well-documented across engineering teams. Feedback loops stretch from hours to days while developers context-switch or wait on reviewers. Research from Forsgren et al. (2024) shows context-switching during code review significantly reduces developer productivity and satisfaction.&lt;/p&gt;
&lt;p&gt;GitHub’s 2024 Octoverse reports median time from PR open to first review is 4 hours in large organizations, 22 hours in enterprises. AI summaries reduce this to under 3 minutes.&lt;/p&gt;
&lt;p&gt;Traditional CI/CD pipelines run automated linters and security scanners, generate reports, then stop. A human reads the output, interprets it, decides if it matters, and either approves or comments. This handoff creates velocity bottlenecks. Eight-hour review windows delay production deployments. Critical insights get buried in noise. Studies confirm developers fear review delays will slow delivery, even though they recognize reviews’ long-term quality benefits (Santos et al., 2024). The cost of this wait scales with engineer compensation.&lt;/p&gt;
&lt;h3 id=&quot;the-security-cost-multiplier&quot;&gt;The Security Cost Multiplier&lt;/h3&gt;
&lt;p&gt;Security defects amplify this cost multiplier. IBM and Software Engineering Institute research confirms production fixes can be orders of magnitude more expensive than early detection. The exact multiplier depends on when the defect surfaces. The expenses compound: rework costs, deployment delays, and potential security incidents all increase exponentially downstream.&lt;/p&gt;
&lt;p&gt;Shift-left automation detects issues before a PR merges, before human review begins. AI analyzes linter output, security scan results, and code patterns in seconds. Analysis time: 2-7 seconds compared to 4-8 hour human reviews. Developers receive immediate feedback, iterate faster, and ship with higher confidence.&lt;/p&gt;
&lt;h3 id=&quot;the-prompt-injection-risk&quot;&gt;The Prompt Injection Risk&lt;/h3&gt;
&lt;p&gt;Raw AI analysis of code diffs introduces a critical vulnerability: prompt injection. If a CI/CD pipeline feeds user-submitted code directly to an AI model, an attacker can craft a PR with embedded instructions that manipulate the AI’s behavior. The AI might approve malicious code, disable security checks, or expose sensitive information. This is not theoretical. It represents a live attack surface in every AI-augmented system.&lt;/p&gt;
&lt;p&gt;Defensive architecture mitigates this risk. The AI analyzes &lt;em&gt;tool output&lt;/em&gt; (structured, deterministic results from linters, security scanners, and static analysis) rather than untrusted input directly. The pipeline sequence: linter runs first, generates JSON, AI summarizes the findings, human approves. This reduces the attack surface to near zero.&lt;/p&gt;
&lt;p&gt;Threat models vary by repository type. A private repository with a trusted five-person team tolerates different risk than open-source projects accepting external contributors. Three security tiers match different threat models while maintaining analysis speed.&lt;/p&gt;
&lt;h2 id=&quot;how-does-ai-integration-work-without-prompt-injection-risk&quot;&gt;How Does AI Integration Work Without Prompt Injection Risk?&lt;/h2&gt;
&lt;p&gt;Tier 1 eliminates prompt injection risk. The linter runs first, produces JSON output, and the AI analyzes only that structured data. The AI never sees the raw code, never processes user input, and never runs in the context of potentially malicious diffs.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AI Analysis - Maximum Security&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;pull_request&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;permissions&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  contents&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; read&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;jobs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  analyze&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    runs-on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ubuntu-latest&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    steps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Checkout&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/checkout@v6&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Lint Code&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; pipx run ruff check --output-format=json . &gt; lint.json || exit 0&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Setup Goose&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; clouatre-labs/setup-goose-action@v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AI Analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        env&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          GOOGLE_API_KEY&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ${{ secrets.GOOGLE_API_KEY }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          echo &quot;Summarize these linting issues:&quot; &gt; prompt.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          cat lint.json &gt;&gt; prompt.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          # Only structured tool output appended. Never raw source code.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          goose run --instructions prompt.txt --no-session --quiet &gt; analysis.md&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Upload Analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/upload-artifact@v5&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; analysis.md&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;tier1-maximum-security.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: In Tier 1, AI analyzes only JSON output from the linter, never raw code. &lt;a href=&quot;https://github.com/clouatre-labs/setup-goose-action/blob/main/examples/tier1-maximum-security.yml&quot;&gt;Full example&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The AI sees only JSON. No code, no comments, no user input. Attack surface: zero. This pattern applies to public repositories, open-source projects, and any system where external contributors submit PRs.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Tier 1 defensive pattern: AI analyzes tool output, never sees raw code. Immune to prompt injection.&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 423px) 423px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;423&quot; height=&quot;480&quot; src=&quot;/_astro/tier1-workflow.BjaCmw1H_Z1MAEKW.webp&quot; srcset=&quot;/_astro/tier1-workflow.BjaCmw1H_Z1MAEKW.webp 423w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Tier 1 defensive pattern. AI analyzes tool output, never sees raw code. Immune to prompt injection.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;tier-2-and-tier-3-speed-vs-security-trade-offs&quot;&gt;Tier 2 and Tier 3: Speed vs. Security Trade-offs&lt;/h2&gt;
&lt;p&gt;Tier 2 provides additional context (file paths, change stats, commit metadata) without exposing raw code. This represents a middle ground: more insight than Tier 1, lower risk than Tier 3.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AI Analysis - Balanced Security&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;pull_request&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;permissions&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  contents&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; read&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;jobs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  analyze&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    runs-on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ubuntu-latest&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    steps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Checkout&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/checkout@v6&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Get Changed Files&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        id&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; files&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          git diff --name-only origin/main...HEAD &gt; files.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          wc -l files.txt &gt;&gt; summary.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Setup Goose&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; clouatre-labs/setup-goose-action@v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AI Analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        env&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          GOOGLE_API_KEY&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ${{ secrets.GOOGLE_API_KEY }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          echo &quot;Review these file changes:&quot; &gt; prompt.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          cat files.txt summary.txt &gt;&gt; prompt.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          # File names and stats. Not the actual code content.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          goose run --instructions prompt.txt --no-session --quiet &gt; analysis.md&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Upload Analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/upload-artifact@v5&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; analysis.md&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;tier2-balanced-security.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: In Tier 2, AI sees file scope and metadata, but not code diffs. &lt;a href=&quot;https://github.com/clouatre-labs/setup-goose-action/blob/main/examples/tier2-balanced-security.yml&quot;&gt;Full example&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The AI sees file-level patterns but not line-by-line changes. Injection risk is low but non-zero: an attacker could craft filenames or commit messages to manipulate analysis. This tier applies to private repositories with trusted contributors.&lt;/p&gt;
&lt;p&gt;Tier 3 applies to small, trusted teams where analysis speed outweighs defense-in-depth requirements. The AI sees full code diffs. Injection risk exists but is controlled through human approval gates.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AI Analysis - Advanced Patterns&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt;on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;pull_request&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;permissions&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  contents&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; read&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;jobs&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  analyze&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    runs-on&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ubuntu-latest&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    steps&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Checkout&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/checkout@v6&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Get Full Diff&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; git diff origin/main...HEAD &gt; changes.diff&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Setup Goose&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; clouatre-labs/setup-goose-action@v1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; AI Analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        env&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          GOOGLE_API_KEY&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ${{ secrets.GOOGLE_API_KEY }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        run&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; |&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          echo &quot;Deeply analyze these code changes:&quot; &gt; prompt.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          cat changes.diff &gt;&gt; prompt.txt&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          # Complete code diffs for maximum context and detail&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;          goose run --instructions prompt.txt --no-session --quiet &gt; analysis.md&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;      -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Upload Analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; actions/upload-artifact@v5&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;        with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ai-analysis&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;          path&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; analysis.md&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;tier3-advanced-patterns.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 3: In Tier 3, AI sees full diffs for subtle patterns. &lt;a href=&quot;https://github.com/clouatre-labs/setup-goose-action/blob/main/examples/tier3-advanced-patterns.yml&quot;&gt;Full example&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Each tier trades visibility for security. Tier 1 eliminates injection risk by sacrificing some context. Tier 2 accepts low risk for moderate context. Tier 3 prioritizes insight over security and is typically used sparingly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tier selection depends on three factors:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Repository access model (external contributors vs internal team)&lt;/li&gt;
&lt;li&gt;Required AI context (tool output vs full diffs)&lt;/li&gt;
&lt;li&gt;Risk tolerance (injection risk vs deeper analysis)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt=&quot;Three security tiers side-by-side showing input type, approval gates, and risk levels for each tier.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 1202px) 1202px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;1202&quot; height=&quot;1046&quot; src=&quot;/_astro/tier-comparison.C5nIQyIl_Z17voPl.webp&quot; srcset=&quot;/_astro/tier-comparison.C5nIQyIl_ftfa7.webp 640w, /_astro/tier-comparison.C5nIQyIl_Z2lvjkQ.webp 750w, /_astro/tier-comparison.C5nIQyIl_Z2etOrF.webp 828w, /_astro/tier-comparison.C5nIQyIl_Z26wz4.webp 1080w, /_astro/tier-comparison.C5nIQyIl_Z17voPl.webp 1202w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Three security tiers. Selection depends on threat model and team trust level.&lt;/em&gt;&lt;/p&gt;





































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Tier&lt;/th&gt;&lt;th&gt;Input&lt;/th&gt;&lt;th&gt;Injection Risk&lt;/th&gt;&lt;th&gt;Approval Gate&lt;/th&gt;&lt;th&gt;Typical Feedback Time&lt;/th&gt;&lt;th&gt;Recommended For&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Tool output (JSON)&lt;/td&gt;&lt;td&gt;None&lt;/td&gt;&lt;td&gt;Human reviews artifact&lt;/td&gt;&lt;td&gt;2-5 min&lt;/td&gt;&lt;td&gt;Public repos, OSS, any external contributors&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;File stats + metadata&lt;/td&gt;&lt;td&gt;Low&lt;/td&gt;&lt;td&gt;Human pre-approval&lt;/td&gt;&lt;td&gt;1-3 min&lt;/td&gt;&lt;td&gt;Private repos, internal teams&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Full code diff&lt;/td&gt;&lt;td&gt;Controlled&lt;/td&gt;&lt;td&gt;Optional&lt;/td&gt;&lt;td&gt;&amp;#x3C;60 sec&lt;/td&gt;&lt;td&gt;Tiny trusted teams only&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Tier comparison: speed, risk, and recommended use.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The decision framework is simple: start at Tier 1. Measure deployment velocity, security posture, and developer satisfaction. Only move to Tier 2 or 3 if team consensus is that the additional AI context outweighs the injection risk. Most teams never need to leave Tier 1.&lt;/p&gt;
&lt;h2 id=&quot;evolution-from-uncontrolled-to-managed-ai-analysis&quot;&gt;Evolution From Uncontrolled to Managed AI Analysis&lt;/h2&gt;
&lt;p&gt;The naive approach feeds AI the code diff directly and allows it to comment on the PR. This is fast, appears intelligent, and creates an injection surface. The improved approach layers security tiers on top, providing a decision framework that matches the threat model.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Evolution from uncontrolled AI analysis (high risk) to managed 3-tier model (risk controlled).&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 1438px) 1438px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;1438&quot; height=&quot;966&quot; src=&quot;/_astro/security-evolution.chJ-qijv_Z1D87se.webp&quot; srcset=&quot;/_astro/security-evolution.chJ-qijv_1mHcKF.webp 640w, /_astro/security-evolution.chJ-qijv_1Q3yT0.webp 750w, /_astro/security-evolution.chJ-qijv_Z1jW4Me.webp 828w, /_astro/security-evolution.chJ-qijv_b4yzQ.webp 1080w, /_astro/security-evolution.chJ-qijv_Z1UbC6r.webp 1280w, /_astro/security-evolution.chJ-qijv_Z1D87se.webp 1438w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Evolution from uncontrolled AI analysis to risk-managed tiers.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The shift is architectural, not just operational. The evolution moves from “AI sees everything and decides” to “AI sees what’s safe and humans decide what matters.” This distinction enables both security and speed improvements.&lt;/p&gt;
&lt;h2 id=&quot;what-outcomes-does-ai-augmented-cicd-deliver&quot;&gt;What Outcomes Does AI-Augmented CI/CD Deliver?&lt;/h2&gt;
&lt;p&gt;First-review latency drops from 4–22 hours (Octoverse 2024) to under 5 minutes, a 50–250× reduction. Developers iterate faster because they receive feedback immediately. CI/CD pipelines do not stall waiting for human review availability.&lt;/p&gt;
&lt;p&gt;Quality improves because AI catches patterns humans miss during late-night reviews or context-switching. Linting issues get flagged automatically. Security tool outputs get analyzed for severity and context. Fewer critical issues reach production because they are caught earlier in the workflow.&lt;/p&gt;
&lt;p&gt;For broader observability patterns in AI agent workflows, including legacy system integration, see &lt;a href=&quot;/posts/ai-agents-legacy-roi&quot;&gt;AI agents in legacy systems&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Developer satisfaction increases when velocity and quality both improve. Engineers are not blocked by the review process. They receive comprehensive feedback without waiting. They trust the pipeline because it combines deterministic tools with AI insight and human judgment.&lt;/p&gt;
&lt;p&gt;Business outcomes are measurable. Deployment frequency increases, mean time to resolution decreases, and security incidents reduce. Engineering teams ship faster without sacrificing safety.&lt;/p&gt;
&lt;h2 id=&quot;implementation-guide&quot;&gt;Implementation Guide&lt;/h2&gt;
&lt;p&gt;Start with Tier 1. It provides maximum security with zero prompt injection risk. The &lt;a href=&quot;https://github.com/clouatre-labs/setup-goose-action/blob/main/examples/tier1-maximum-security.yml&quot;&gt;example workflow&lt;/a&gt; demonstrates the complete pattern. For AWS-native environments, &lt;a href=&quot;https://github.com/clouatre-labs/setup-kiro-action&quot;&gt;setup-kiro-action&lt;/a&gt; offers SIGV4 authentication without API keys in secrets.&lt;/p&gt;
&lt;p&gt;Baseline measurement establishes the starting point: current review latency, deployment frequency, and security incident rate. A two-week measurement period provides sufficient data for comparison. After AI integration, the same metrics reveal impact.&lt;/p&gt;
&lt;p&gt;Tier selection depends on threat model. External contributors and public repositories warrant Tier 1. Internal teams with trusted code may benefit from Tier 2 or Tier 3 context. The key is matching exposure level to trust level.&lt;/p&gt;
&lt;p&gt;The human gate remains essential. AI generates artifacts for review, not merge approvals. Engineers validate recommendations before acting. This preserves accountability while accelerating feedback cycles.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;For observability patterns in AI agent workflows, see &lt;a href=&quot;/posts/ai-observability-gaps&quot;&gt;AI Observability Gaps&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Boehm &amp;#x26; Basili, “Software Defect Reduction Top 10 List” (2001) — &lt;a href=&quot;https://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf&quot;&gt;https://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Forsgren et al., “DevEx in Action: A study of its tangible impacts” (2024) — &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3639443&quot;&gt;https://dl.acm.org/doi/10.1145/3639443&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub Octoverse 2024 — &lt;a href=&quot;https://github.blog/news-insights/octoverse/octoverse-2024/&quot;&gt;https://github.blog/news-insights/octoverse/octoverse-2024/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OWASP LLM Top 10 (2025 edition) – Prompt Injection #1 — &lt;a href=&quot;https://owasp.org/www-project-top-10-for-large-language-model-applications/&quot;&gt;https://owasp.org/www-project-top-10-for-large-language-model-applications/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Santos et al., “Modern code review in practice: A developer-centric study” (2024) — &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0164121224003327&quot;&gt;https://www.sciencedirect.com/science/article/pii/S0164121224003327&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ci-cd</category><category>devops</category><category>security</category><category>implementation-guide</category><author>Hugues Clouâtre</author></item><item><title>AI-Assisted Development: From Implementation to Judgment</title><link>https://clouatre.ca/posts/ai-assisted-development-judgment-over-implementation/</link><guid isPermaLink="true">https://clouatre.ca/posts/ai-assisted-development-judgment-over-implementation/</guid><description>From typing code to evaluating proposals. 70-80% time savings when AI explores options, you make the call. Real metrics from production.</description><pubDate>Thu, 05 Feb 2026 20:47:00 GMT</pubDate><content:encoded>&lt;p&gt;Software developers spend roughly equal time on meetings (12%) and coding (11%), with the remaining time distributed across debugging, architecture, reviews, and operational tasks. This fragmentation correlates with decreased productivity and satisfaction when it creates a gap between actual and ideal time allocation (Kumar et al., 2025).&lt;/p&gt;
&lt;p&gt;Traditional workflows treat implementation as the bottleneck, forcing a costly trade-off: explore multiple solutions (expensive) or ship the first working approach (fast but suboptimal). Large-scale projects face 50+ key decisions where &lt;a href=&quot;https://www.emilbacklund.com/p/a-cost-based-decision-framework-for&quot;&gt;“it’s simply infeasible to thoroughly research every single decision”&lt;/a&gt; before implementation (Backlund, 2024). The real constraint isn’t typing speed. It’s decision quality under fragmented time.&lt;/p&gt;
&lt;p&gt;AI-assisted development breaks this trade-off. Implementation becomes a review task. Expert judgment focuses on strategic decisions.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#how-does-ai-assisted-development-shift-time-allocation&quot;&gt;How Does AI-Assisted Development Shift Time Allocation?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#real-example-ci-modernization&quot;&gt;Real Example: CI Modernization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#real-example-matrix-operations-feature&quot;&gt;Real Example: Matrix Operations Feature&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-business-value-does-ai-assisted-development-deliver&quot;&gt;What Business Value Does AI-Assisted Development Deliver?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#decision-quality&quot;&gt;Decision Quality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#senior-engineer-leverage&quot;&gt;Senior Engineer Leverage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#measured-time-savings&quot;&gt;Measured Time Savings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#strategic-impact&quot;&gt;Strategic Impact&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#how-do-recipes-codify-engineering-judgment&quot;&gt;How Do Recipes Codify Engineering Judgment?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-does-this-approach-work&quot;&gt;When Does This Approach Work?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-transformation&quot;&gt;The Transformation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-does-ai-assisted-development-shift-time-allocation&quot;&gt;How Does AI-Assisted Development Shift Time Allocation?&lt;/h2&gt;
&lt;p&gt;AI assistants shift the focus from implementation to judgment. You spend minimal time reviewing code and most time applying strategic thinking.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://cacm.acm.org/research/measuring-github-copilots-impact-on-productivity/&quot;&gt;Controlled studies show&lt;/a&gt; 55% faster task completion with AI assistance. The gain isn’t typing speed. It’s preserving cognitive capacity for critical thinking.&lt;/p&gt;
&lt;p&gt;Critical thinking scales. Implementation doesn’t. You can evaluate 3 architectural approaches in the time it takes to implement one.&lt;/p&gt;
&lt;p&gt;Example: &lt;a href=&quot;https://github.com/block/goose&quot;&gt;Goose&lt;/a&gt; (open-source AI assistant) handles codebase analysis, implementation, testing, and documentation. You provide business context, architectural judgment, and approve decisions at critical gates.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Time allocation comparison: traditional vs AI-assisted development&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 1476px) 1476px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;1476&quot; height=&quot;980&quot; src=&quot;/_astro/time-allocation-comparison.BJ-r95NA_Z28RYjf.webp&quot; srcset=&quot;/_astro/time-allocation-comparison.BJ-r95NA_2sqSDw.webp 640w, /_astro/time-allocation-comparison.BJ-r95NA_Z1ExWIP.webp 750w, /_astro/time-allocation-comparison.BJ-r95NA_M7brn.webp 828w, /_astro/time-allocation-comparison.BJ-r95NA_Z6biT2.webp 1080w, /_astro/time-allocation-comparison.BJ-r95NA_sNnTx.webp 1280w, /_astro/time-allocation-comparison.BJ-r95NA_Z28RYjf.webp 1476w&quot;&gt;
&lt;em&gt;Figure 1: Traditional approach focuses on implementation overhead, AI-assisted approach maximizes strategic thinking&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;real-example-ci-modernization&quot;&gt;Real Example: CI Modernization&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Context:&lt;/strong&gt; &lt;code&gt;math-mcp-learning-server&lt;/code&gt; had no CI workflow. Legacy mypy was slow and unused.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The judgment call:&lt;/strong&gt; Build CI from scratch or adapt patterns from a similar project?&lt;/p&gt;
&lt;p&gt;AI identifies reusable patterns: Ruff (linter/formatter) + uv (package manager) + pytest-cov. I review the risk assessment, verify the tooling choices match project needs, and confirm zero regressions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~20 minutes vs 3-4 hours from scratch&lt;/li&gt;
&lt;li&gt;CI runtime: 5 seconds (&lt;a href=&quot;https://github.com/astral-sh/ruff&quot;&gt;Ruff is 10-100x faster&lt;/a&gt; than legacy tooling)&lt;/li&gt;
&lt;li&gt;67 tests passing, 83% coverage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href=&quot;https://github.com/clouatre-labs/math-mcp-learning-server/pull/52&quot;&gt;PR #52 - Add modern CI workflow&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;real-example-matrix-operations-feature&quot;&gt;Real Example: Matrix Operations Feature&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Context:&lt;/strong&gt; &lt;code&gt;math-mcp-learning-server&lt;/code&gt; needed 5 matrix operation tools with NumPy integration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The judgment call:&lt;/strong&gt; Implement incrementally (one tool per PR) or batch with shared validation patterns?&lt;/p&gt;
&lt;p&gt;AI identifies common infrastructure: dimension validation, ToolError handling, DoS prevention via size limits. I review API design, error handling conventions, and security limits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2 minutes from PR creation to merge&lt;/li&gt;
&lt;li&gt;5 tools, 21 tests, 395 lines added&lt;/li&gt;
&lt;li&gt;Patterns reusable for future tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href=&quot;https://github.com/clouatre-labs/math-mcp-learning-server/pull/109&quot;&gt;PR #109 - Implement 5 matrix operation tools&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-business-value-does-ai-assisted-development-deliver&quot;&gt;What Business Value Does AI-Assisted Development Deliver?&lt;/h2&gt;
&lt;h3 id=&quot;decision-quality&quot;&gt;Decision Quality&lt;/h3&gt;
&lt;p&gt;The fundamental shift is from “implement first, evaluate later” to “evaluate first, implement once”. With AI handling implementation, you can explore 2-3 alternatives per decision instead of committing to the first working approach. &lt;a href=&quot;https://www.computer.org/resources/software-engineering-economics&quot;&gt;Research confirms&lt;/a&gt; “the best solution must first become a candidate before being selected… More candidates increase the likelihood the ideal solution is among them.”&lt;/p&gt;
&lt;p&gt;Time to validated options drops from 1.5-6 hours to 15 minutes to 1 hour. Architectural reversals decrease because upfront analysis improves.&lt;/p&gt;
&lt;h3 id=&quot;senior-engineer-leverage&quot;&gt;Senior Engineer Leverage&lt;/h3&gt;





























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Traditional&lt;/th&gt;&lt;th&gt;AI-Assisted&lt;/th&gt;&lt;th&gt;Business Impact&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Time allocation&lt;/td&gt;&lt;td&gt;Implementation-heavy&lt;/td&gt;&lt;td&gt;Judgment-focused&lt;/td&gt;&lt;td&gt;Maximize expert leverage&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Scope per engineer&lt;/td&gt;&lt;td&gt;1-2 specialties&lt;/td&gt;&lt;td&gt;Full stack&lt;/td&gt;&lt;td&gt;Eliminate specialist bottlenecks&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Exploration cost&lt;/td&gt;&lt;td&gt;High (must implement)&lt;/td&gt;&lt;td&gt;Low (preview and abandon)&lt;/td&gt;&lt;td&gt;Ship best solution, not first&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Comparison of senior engineer time allocation and scope between traditional and AI-assisted approaches&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This transformation aligns with research on AI-assisted development: &lt;a href=&quot;https://resources.github.com/learn/pathways/copilot/essentials/measuring-the-impact-of-github-copilot/&quot;&gt;GitHub studies found&lt;/a&gt; 60-75% of developers report increased job fulfillment and 87% preserve mental effort on repetitive tasks when using AI coding assistants.&lt;/p&gt;
&lt;h3 id=&quot;measured-time-savings&quot;&gt;Measured Time Savings&lt;/h3&gt;





























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Task&lt;/th&gt;&lt;th&gt;AI-Assisted&lt;/th&gt;&lt;th&gt;Traditional&lt;/th&gt;&lt;th&gt;Savings&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;CI modernization&lt;/td&gt;&lt;td&gt;~20 minutes&lt;/td&gt;&lt;td&gt;3-4 hours&lt;/td&gt;&lt;td&gt;~90%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Matrix operations (5 tools)&lt;/td&gt;&lt;td&gt;2 minutes&lt;/td&gt;&lt;td&gt;1-2 hours estimated&lt;/td&gt;&lt;td&gt;~95%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DNS migration&lt;/td&gt;&lt;td&gt;&lt;a href=&quot;/posts/zero-downtime-dns-migration/&quot;&gt;2 hours&lt;/a&gt;&lt;/td&gt;&lt;td&gt;4-6 hours&lt;/td&gt;&lt;td&gt;~60%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 2: Measured time savings across infrastructure and DevOps tasks&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/&quot;&gt;Industry research validates these gains&lt;/a&gt;: 26% overall productivity increase across 4,867 developers, with &lt;a href=&quot;https://arxiv.org/abs/2406.17910&quot;&gt;30-50% time savings&lt;/a&gt; on repetitive tasks in enterprise settings.&lt;/p&gt;
&lt;p&gt;At 10 infrastructure tasks per month, this recovers ~60 hours per year per engineer. That is 1.5 weeks of productive time returned to strategic work.&lt;/p&gt;
&lt;h3 id=&quot;strategic-impact&quot;&gt;Strategic Impact&lt;/h3&gt;






























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Outcome&lt;/th&gt;&lt;th&gt;Shift&lt;/th&gt;&lt;th&gt;Business Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Engineer capability&lt;/td&gt;&lt;td&gt;1-2 specialties → Full-stack&lt;/td&gt;&lt;td&gt;Eliminate specialist bottlenecks&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Production risk&lt;/td&gt;&lt;td&gt;Manual review → AI + gates&lt;/td&gt;&lt;td&gt;Governance without slowdown&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Knowledge retention&lt;/td&gt;&lt;td&gt;Tribal → Codified recipes&lt;/td&gt;&lt;td&gt;Team continuity&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Onboarding time&lt;/td&gt;&lt;td&gt;Weeks → Hours&lt;/td&gt;&lt;td&gt;Faster scaling&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 3: Strategic outcomes from traditional to AI-assisted workflows&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-do-recipes-codify-engineering-judgment&quot;&gt;How Do Recipes Codify Engineering Judgment?&lt;/h2&gt;
&lt;p&gt;Goose uses “recipes”: YAML workflow definitions that codify your judgment and process. The key innovation is mandatory STOP points where AI proposes and you approve before proceeding.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5-phase workflow:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;ANALYZE&lt;/strong&gt; - Understand codebase and problem&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RESEARCH&lt;/strong&gt; - Explore 2-3 solution approaches with trade-offs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PLAN&lt;/strong&gt; - Detailed implementation plan&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IMPLEMENT&lt;/strong&gt; - Code, tests, documentation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PREPARE&lt;/strong&gt; - Create PR, verify branch, push&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img alt=&quot;Recipe workflow diagram with 5 STOP gates for human approval&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 255px) 255px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;255&quot; height=&quot;2327&quot; src=&quot;/_astro/recipe-workflow.Gx5nzfPz_ZDkgRB.webp&quot; srcset=&quot;/_astro/recipe-workflow.Gx5nzfPz_ZDkgRB.webp 255w&quot;&gt;
&lt;em&gt;Figure 2: Recipe workflow enforces governance through 5 mandatory approval gates - AI proposes, human judges&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This matters for four reasons. First, repeatable process replaces ad-hoc prompting. Second, audit trails capture every decision in PR history. Third, human judgment gates ensure governance without blind automation. Fourth, codified expertise becomes an onboarding tool.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example GATE pattern:&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;## Phase 1: RESEARCH&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;Understand scope and constraints&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Read issue/PR description, linked discussions&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Identify affected files with `rg` and `analyze`&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Note CI requirements, test patterns, coding standards&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;### GATE: Research Summary  &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;--shiki-light:#DF8E1D;--shiki-dark:#EED49F&quot;&gt;*STOP&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt; -&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Present to user:**&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Problem statement (1-2 sentences)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Affected files and scope&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Constraints discovered (CI, tests, dependencies)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; 2-3 possible approaches with trade-offs&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#EA76CB;--shiki-dark:#F5BDE6&quot;&gt;*&lt;/span&gt;&lt;span style=&quot;--shiki-light:#DF8E1D;--shiki-dark:#EED49F&quot;&gt;*ASK:**&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;Which approach do you prefer?&quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;~/.config/goose/recipes/goose-coder.yaml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: GATE pattern from production recipe. AI presents constrained options, human selects direction before any code is written.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Branch hygiene is enforced by global githooks: pre-push blocks protected branches, commit-msg requires conventional commits with DCO. The recipe ensures work starts on feature branches. Full recipe: &lt;a href=&quot;https://gist.github.com/clouatre/11e8afc102d659420921db6fcff4409a&quot;&gt;goose-coder.yaml on GitHub Gist&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Conventional commit format&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;CONVENTIONAL_REGEX&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&apos;^(feat|fix|docs|...)(\([a-z0-9_-]+\))?(!)?: .{1,100}$&apos;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; !&lt;/span&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt; echo&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;$COMMIT_MSG&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; |&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; grep&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; -qE&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;$CONVENTIONAL_REGEX&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; then&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt;    echo&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;BLOCKED: Commit message must follow conventional format&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt;    exit&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;fi&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# DCO required&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;if&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt; !&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt; grep&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; -q&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;^Signed-off-by:&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#4C4F69;--shiki-dark:#CAD3F5&quot;&gt;$COMMIT_MSG_FILE&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;;&lt;/span&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt; then&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt;    echo&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &quot;BLOCKED: Missing DCO (Signed-off-by)&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#D20F39;--shiki-light-font-style:italic;--shiki-dark:#ED8796;--shiki-dark-font-style:italic&quot;&gt;    exit&lt;/span&gt;&lt;span style=&quot;--shiki-light:#FE640B;--shiki-dark:#F5A97F&quot;&gt; 1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#8839EF;--shiki-dark:#C6A0F6&quot;&gt;fi&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;~/.githooks/commit-msg&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: Global commit-msg hook enforces conventional commits and DCO. Githooks provide hard blocks; recipes provide guidance.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;when-does-this-approach-work&quot;&gt;When Does This Approach Work?&lt;/h2&gt;






























&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Task Type&lt;/th&gt;&lt;th&gt;AI-Assisted Fit&lt;/th&gt;&lt;th&gt;Evidence&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;CI/DevOps automation&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;td&gt;20 min vs 3-4 hrs (PR #52)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Feature implementation&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;td&gt;2 min for 5 tools (PR #109)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Boilerplate generation&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;td&gt;Common pattern in both PRs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Greenfield architecture&lt;/td&gt;&lt;td&gt;Medium&lt;/td&gt;&lt;td&gt;More judgment gates needed&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 4: Task type fit for AI-assisted development based on production experience&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Low-fit tasks include security-sensitive code (AI may miss edge cases), regex/parsing logic (subtle bugs compound), and legacy systems without documentation (AI lacks context).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Critical success factor:&lt;/strong&gt; You must have expertise to evaluate proposals. AI amplifies judgment, it does not replace it.&lt;/p&gt;
&lt;h2 id=&quot;the-transformation&quot;&gt;The Transformation&lt;/h2&gt;
&lt;p&gt;Traditional software economics fragments expert time across operational overhead. When senior engineers spend only 11% of their time coding, the bottleneck isn’t implementation speed. It’s context-switching between debugging, reviews, meetings, and operational tasks.&lt;/p&gt;
&lt;p&gt;AI-assisted development consolidates this fragmentation. AI handles the operational work (debugging patterns, boilerplate, documentation), freeing human attention for architectural decisions and strategic judgment. The scarce resource shifts from “time to implement” to “ability to decide.”&lt;/p&gt;
&lt;p&gt;When implementation becomes cheap (minutes instead of hours), exploration becomes affordable. You can evaluate three approaches, prototype two, and ship the best one, all in less time than the traditional single-path approach.&lt;/p&gt;
&lt;p&gt;For technical leaders, this amplifies your most expensive resource: expert judgment. When your bottleneck is making the right decision, not finding time to code, AI becomes a strategic multiplier.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;ACM, “Measuring GitHub Copilot’s Impact on Productivity” (2024) — &lt;a href=&quot;https://cacm.acm.org/research/measuring-github-copilots-impact-on-productivity/&quot;&gt;https://cacm.acm.org/research/measuring-github-copilots-impact-on-productivity/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Backlund, Emil, “A Cost-Based Decision Framework for Software Engineers” (2024) — &lt;a href=&quot;https://www.emilbacklund.com/p/a-cost-based-decision-framework-for&quot;&gt;https://www.emilbacklund.com/p/a-cost-based-decision-framework-for&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub, “Research: Quantifying GitHub Copilot’s impact on developer productivity and happiness” (2024) — &lt;a href=&quot;https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/&quot;&gt;https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub Resources, “Measuring the Impact of GitHub Copilot” (2024) — &lt;a href=&quot;https://resources.github.com/learn/pathways/copilot/essentials/measuring-the-impact-of-github-copilot/&quot;&gt;https://resources.github.com/learn/pathways/copilot/essentials/measuring-the-impact-of-github-copilot/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;IEEE Computer Society, “Software Engineering Economics and Declining Budgets” (2024) — &lt;a href=&quot;https://www.computer.org/resources/software-engineering-economics&quot;&gt;https://www.computer.org/resources/software-engineering-economics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Kumar et al., “Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven Era” (2025) — &lt;a href=&quot;https://arxiv.org/abs/2502.15287&quot;&gt;https://arxiv.org/abs/2502.15287&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Pandey et al., “Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects” (2024) — &lt;a href=&quot;https://arxiv.org/abs/2406.17910&quot;&gt;https://arxiv.org/abs/2406.17910&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>automation</category><category>devops</category><category>goose</category><author>Hugues Clouâtre</author></item><item><title>Migrating to Cloudflare Pages: One Prompt, Zero Manual Work</title><link>https://clouatre.ca/posts/zero-downtime-dns-migration/</link><guid isPermaLink="true">https://clouatre.ca/posts/zero-downtime-dns-migration/</guid><description>GitHub Pages to Cloudflare Pages in 2 hours with zero downtime. AI-assisted DNS, hosting, and CI/CD migration with validation and real metrics.</description><pubDate>Thu, 29 Jan 2026 02:30:05 GMT</pubDate><content:encoded>&lt;p&gt;We migrated website infrastructure from Amazon Route53 + GitHub Pages to Cloudflare &lt;strong&gt;in 2 hours, during business hours&lt;/strong&gt;. This included hosting, DNS, and CI/CD. Zero downtime. Zero manual commands.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The entire migration:&lt;/strong&gt; One prompt. Then review and approve AI-proposed changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why this matters for executives:&lt;/strong&gt; DNS migrations traditionally require specialized DevOps knowledge, extended maintenance windows, and carry significant risk. A single misconfigured record can break email, take down services, or disrupt business operations for hours. This approach eliminates that risk through programmatic validation and automation.&lt;/p&gt;
&lt;p&gt;The only manual step: Creating a Cloudflare API token.&lt;/p&gt;
&lt;h2 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h2&gt;
&lt;p&gt;&lt;/p&gt;&lt;details&gt;&lt;summary&gt;Contents&lt;/summary&gt;&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-starting-point&quot;&gt;The Starting Point&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-goal&quot;&gt;The Goal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#why-this-matters-for-your-business&quot;&gt;Why This Matters for Your Business&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#the-traditional-challenge&quot;&gt;The Traditional Challenge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-difference-with-ai-assistance&quot;&gt;The Difference with AI Assistance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#strategic-value&quot;&gt;Strategic Value&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-starting-prompt&quot;&gt;The Starting Prompt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-got-automated&quot;&gt;What Got Automated&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#discovery-and-cleanup&quot;&gt;Discovery and Cleanup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#pre-migration-validation&quot;&gt;Pre-Migration Validation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#cicd-reconfiguration&quot;&gt;CI/CD Reconfiguration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#governance-trail&quot;&gt;Governance Trail&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#preview-infrastructure&quot;&gt;Preview Infrastructure&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-only-manual-step&quot;&gt;The Only Manual Step&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#before-fragmented-infrastructure&quot;&gt;BEFORE: Fragmented Infrastructure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#after-unified-platform&quot;&gt;AFTER: Unified Platform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#results&quot;&gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-business-impact-does-ai-assisted-migration-deliver&quot;&gt;What Business Impact Does AI-Assisted Migration Deliver?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#key-lessons&quot;&gt;Key Lessons&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#1-the-ai-stack-handles-implementation-details&quot;&gt;1. The AI Stack Handles Implementation Details&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#2-pre-validation-eliminates-risk&quot;&gt;2. Pre-Validation Eliminates Risk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#3-automate-record-migration&quot;&gt;3. Automate Record Migration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#4-preview-deployments-change-everything&quot;&gt;4. Preview Deployments Change Everything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#5-the-paradigm-shift-from-careful-planning-to-confident-execution&quot;&gt;5. The Paradigm Shift: From Careful Planning to Confident Execution&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#when-does-this-approach-apply&quot;&gt;When Does This Approach Apply?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-is-the-roi-of-ai-assisted-infrastructure-migration&quot;&gt;What Is the ROI of AI-Assisted Infrastructure Migration?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;&lt;/details&gt;&lt;p&gt;&lt;/p&gt;
&lt;h2 id=&quot;the-starting-point&quot;&gt;The Starting Point&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Infrastructure:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Code repository:&lt;/strong&gt; GitHub (unchanged after migration)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hosting:&lt;/strong&gt; GitHub Pages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DNS:&lt;/strong&gt; Amazon Route53 (20+ DNS records)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Domain:&lt;/strong&gt; Squarespace (registrar)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CI/CD:&lt;/strong&gt; GitHub Actions → GitHub Pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Critical services:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Email (5 MX records for Google Workspace)&lt;/li&gt;
&lt;li&gt;Google Workspace services (Calendar, Contacts, Sites)&lt;/li&gt;
&lt;li&gt;SSL validation records&lt;/li&gt;
&lt;li&gt;Legacy service records (needed cleanup)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;the-goal&quot;&gt;The Goal&lt;/h2&gt;
&lt;p&gt;Migrate everything to Cloudflare:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Faster DNS globally&lt;/li&gt;
&lt;li&gt;Faster deployments&lt;/li&gt;
&lt;li&gt;Simplified infrastructure (one platform)&lt;/li&gt;
&lt;li&gt;Preview deployments for PRs&lt;/li&gt;
&lt;li&gt;Zero downtime (email cannot break)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;why-this-matters-for-your-business&quot;&gt;Why This Matters for Your Business&lt;/h2&gt;
&lt;h3 id=&quot;the-traditional-challenge&quot;&gt;The Traditional Challenge&lt;/h3&gt;
&lt;p&gt;DNS migrations can be done with zero downtime, but they require extensive planning and careful execution. One misconfigured MX record means email down for hours, and &lt;a href=&quot;https://uptimeinstitute.com/about-ui/press-releases/uptime-announces-annual-outage-analysis-report-2025&quot;&gt;human error causes 66-80% of outages&lt;/a&gt; (Uptime Institute, 2025). Imagine missing customer orders, support tickets, or sales inquiries during your peak season.&lt;/p&gt;
&lt;h3 id=&quot;the-difference-with-ai-assistance&quot;&gt;The Difference with AI Assistance&lt;/h3&gt;
&lt;p&gt;Same zero-downtime outcome, but with programmatic validation instead of manual checklists. Business hours execution becomes feasible because pre-validation eliminates guesswork. Teams without specialized DevOps expertise can execute complex migrations confidently.&lt;/p&gt;
&lt;h3 id=&quot;strategic-value&quot;&gt;Strategic Value&lt;/h3&gt;
&lt;p&gt;Infrastructure changes shift from high-stress, weekend events to business-hours execution with automated validation. Experienced engineers still evaluate proposals, but with dramatically reduced risk and time investment. Preview deployments enable stakeholder review before release.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The transformation:&lt;/strong&gt; From possible-but-stressful to routine-and-confident.&lt;/p&gt;
&lt;h2 id=&quot;the-starting-prompt&quot;&gt;The Starting Prompt&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;What we told Goose (&lt;a href=&quot;https://github.com/block/goose&quot;&gt;open-source AI assistant&lt;/a&gt; powered by Claude Sonnet 4.5):&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I want to migrate from GitHub Pages to Cloudflare Pages.
The domain clouatre.ca is registered at Squarespace.
I need zero downtime - email and Google Workspace cannot break.
Check if DNSSEC is enabled and handle it appropriately.
Use a risk-adverse approach.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This prompt started a &lt;a href=&quot;/posts/ai-assisted-development-judgment-over-implementation/#the-recipe-model-codifying-judgment&quot;&gt;5-phase recipe workflow&lt;/a&gt; with mandatory approval gates: I reviewed and approved each phase (Analyze → Research → Plan → Implement → Prepare). Not autonomous execution, AI-assisted with human governance at every decision point.&lt;/p&gt;
&lt;p&gt;We didn’t need to specify where DNS was hosted (discovered Route53 automatically), how many DNS records existed (found 20+), which records were critical vs obsolete, how to configure Cloudflare Pages, or how to set up GitHub Actions for Cloudflare. The AI handled discovery, analysis, and execution.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Claude Sonnet 4.5 (reasoning):&lt;/strong&gt; Analyzes context, decides what to do, discovers infrastructure, validates approach&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goose (execution layer):&lt;/strong&gt; Provides tool access (shell, git, AWS CLI, gh), manages conversation state, enforces approval gates&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Human (governance):&lt;/strong&gt; Reviews proposals at gates, approves/rejects, maintains control&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; We reviewed every decision. The AI proposed, we approved. The combination of automation + human judgment enabled confidence.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Cloudflare migration workflow diagram showing approval gates and validation steps&quot; loading=&quot;eager&quot; decoding=&quot;sync&quot; fetchpriority=&quot;high&quot; sizes=&quot;(min-width: 338px) 338px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;338&quot; height=&quot;1549&quot; src=&quot;/_astro/migration-workflow.DMl2TQ9g_1cP6Er.webp&quot; srcset=&quot;/_astro/migration-workflow.DMl2TQ9g_1cP6Er.webp 338w&quot;&gt;
&lt;em&gt;Figure 1: AI-assisted migration workflow with two human approval gates ensuring governance and confidence&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-got-automated&quot;&gt;What Got Automated&lt;/h2&gt;
&lt;p&gt;The migration workflow orchestrated five critical phases.&lt;/p&gt;
&lt;h3 id=&quot;discovery-and-cleanup&quot;&gt;Discovery and Cleanup&lt;/h3&gt;
&lt;p&gt;Claude analyzed 20+ Route53 records and separated signal from noise: 15 critical records (email, Google Workspace, SSL validation) and 5 obsolete entries (old servers, expired validations). DNSSEC verification came back negative, confirming no migration blocker.&lt;/p&gt;
&lt;h3 id=&quot;pre-migration-validation&quot;&gt;Pre-Migration Validation&lt;/h3&gt;
&lt;p&gt;Records were exported from Route53 and imported to Cloudflare via APIs, then tested against Cloudflare nameservers before switching. This included verifying email servers (MX priorities), SPF, DKIM, DMARC (exact TXT values), CNAMEs (Google Workspace), and comparing TTL values between source and target. The validation report confirmed 100% match.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Verify records match before switching nameservers&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt;dig&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; @nameserver1.cloudflare.com&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; clouatre.ca&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; MX&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; +short&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Output: 1 aspmx.l.google.com. (matches Route53)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt;diff&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &amp;#x3C;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt;aws&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; route53 list-resource-record-sets)&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; &amp;#x3C;(&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-light-font-style:italic;--shiki-dark:#8AADF4;--shiki-dark-font-style:italic&quot;&gt;curl&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; cloudflare-api)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Output: (empty = 100% match, zero risk)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;scripts/validate-cloudflare-dns.sh&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 1: Pre-validation against Cloudflare nameservers before switching (zero output from diff = zero risk)&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;cicd-reconfiguration&quot;&gt;CI/CD Reconfiguration&lt;/h3&gt;
&lt;p&gt;GitHub Actions were updated to deploy to Cloudflare Pages via wrangler (Cloudflare’s CLI), with base URL fixes (GitHub’s &lt;code&gt;/repo/&lt;/code&gt; path to root &lt;code&gt;/&lt;/code&gt;) and a preview deployment workflow with 7-day auto-cleanup. Result: 38-second deploys, down from 5-8 minutes.&lt;/p&gt;
&lt;pre class=&quot;astro-code astro-code-themes catppuccin-latte catppuccin-macchiato has-highlighted mt-8&quot; style=&quot;--shiki-light:#4c4f69;--shiki-dark:#cad3f5;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#24273a; overflow-x: auto;--file-name-offset: -0.75rem;&quot; tabindex=&quot;0&quot; data-language=&quot;yaml&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-light-font-style:italic;--shiki-dark:#939AB7;--shiki-dark-font-style:italic&quot;&gt;# Cloudflare Pages deployment (38-second deploys)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#7C7F93;--shiki-dark:#939AB7&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt; name&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; Deploy to Cloudflare Pages&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  uses&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; cloudflare/wrangler-action@v3&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;  with&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    apiToken&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ${{ secrets.CLOUDFLARE_API_TOKEN }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    accountId&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line highlighted&quot;&gt;&lt;span style=&quot;--shiki-light:#1E66F5;--shiki-dark:#8AADF4&quot;&gt;    command&lt;/span&gt;&lt;span style=&quot;--shiki-light:#179299;--shiki-dark:#8BD5CA&quot;&gt;:&lt;/span&gt;&lt;span style=&quot;--shiki-light:#40A02B;--shiki-dark:#A6DA95&quot;&gt; pages deploy dist --project-name=clouatre-ca&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;span class=&quot;absolute py-1 text-foreground text-xs font-medium leading-4 pl-4 pr-2 before:inline-block before:size-1 before:bg-green-500 before:rounded-full before:absolute before:top-[45%] before:left-2 left-2 top-(--file-name-offset) border rounded-md bg-background&quot;&gt;.github/workflows/deploy.yml&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Code Snippet 2: GitHub Actions deployment to Cloudflare Pages (replaced GitHub Pages action for 88% faster deploys)&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&quot;governance-trail&quot;&gt;Governance Trail&lt;/h3&gt;
&lt;p&gt;The assistant created PRs with migration context, rationale, and rollback procedures. Every change was reviewable before production, creating an audit trail for compliance.&lt;/p&gt;
&lt;h3 id=&quot;preview-infrastructure&quot;&gt;Preview Infrastructure&lt;/h3&gt;
&lt;p&gt;Every branch gets a preview URL automatically. Stakeholders can review before merge, and the system handles auto-cleanup with zero maintenance.&lt;/p&gt;
&lt;h2 id=&quot;the-only-manual-step&quot;&gt;The Only Manual Step&lt;/h2&gt;
&lt;p&gt;Creating a Cloudflare API token (2 minutes):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Cloudflare dashboard → API Tokens&lt;/li&gt;
&lt;li&gt;Create token with Pages permissions&lt;/li&gt;
&lt;li&gt;Store in GitHub secrets&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Everything else: automated.&lt;/p&gt;
&lt;h3 id=&quot;before-fragmented-infrastructure&quot;&gt;BEFORE: Fragmented Infrastructure&lt;/h3&gt;
&lt;p&gt;&lt;img alt=&quot;Before migration infrastructure&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 424px) 424px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;424&quot; height=&quot;550&quot; src=&quot;/_astro/infrastructure-before.BjhE7CD__2rawmX.webp&quot; srcset=&quot;/_astro/infrastructure-before.BjhE7CD__2rawmX.webp 424w&quot;&gt;&lt;/p&gt;
&lt;h3 id=&quot;after-unified-platform&quot;&gt;AFTER: Unified Platform&lt;/h3&gt;
&lt;p&gt;&lt;img alt=&quot;After migration infrastructure&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; sizes=&quot;(min-width: 407px) 407px, 100vw&quot;  data-astro-image=&quot;constrained&quot; width=&quot;407&quot; height=&quot;598&quot; src=&quot;/_astro/infrastructure-after.BuYBnjIz_8TnH1.webp&quot; srcset=&quot;/_astro/infrastructure-after.BuYBnjIz_8TnH1.webp 407w&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 2: Infrastructure transformation - from fragmented AWS/GitHub setup to unified Cloudflare platform&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
&lt;p&gt;Traditional manual DNS migrations typically require 4-6 hours of focused work and weekend execution windows to minimize business risk: planning, exporting records, importing, testing, monitoring propagation. The stakes are high. &lt;a href=&quot;https://www.splunk.com/en_us/newsroom/press-releases/2024/conf24-splunk-report-shows-downtime-costs-global-2000-companies-400-billion-annually.html&quot;&gt;Downtime costs Global 2000 companies $400B annually&lt;/a&gt; (Splunk/Oxford Economics, 2024).&lt;/p&gt;









































&lt;table tabindex=&quot;0&quot;&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Before&lt;/th&gt;&lt;th&gt;After&lt;/th&gt;&lt;th&gt;Business Impact&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;DNS Resolution&lt;/td&gt;&lt;td&gt;20-30ms&lt;/td&gt;&lt;td&gt;&lt;a href=&quot;https://www.dnsperf.com/&quot;&gt;10-15ms&lt;/a&gt;&lt;/td&gt;&lt;td&gt;50% faster global access&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy Time&lt;/td&gt;&lt;td&gt;5-8 min&lt;/td&gt;&lt;td&gt;38 sec&lt;/td&gt;&lt;td&gt;&lt;strong&gt;88% reduction&lt;/strong&gt; - 10x faster iteration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Platform Cost&lt;/td&gt;&lt;td&gt;Route53: $12/year&lt;/td&gt;&lt;td&gt;Cloudflare: Free&lt;/td&gt;&lt;td&gt;Cost-neutral migration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Preview Deployments&lt;/td&gt;&lt;td&gt;None&lt;/td&gt;&lt;td&gt;Per PR&lt;/td&gt;&lt;td&gt;Catch issues before production&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Migration Window&lt;/td&gt;&lt;td&gt;Weekend (risk mitigation)&lt;/td&gt;&lt;td&gt;2 hours, business hours&lt;/td&gt;&lt;td&gt;Eliminates deployment stress&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Table 1: Before and after metrics - Complete migration (DNS + Hosting + CI/CD) completed in 2 hours, zero downtime, zero manual commands&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-business-impact-does-ai-assisted-migration-deliver&quot;&gt;What Business Impact Does AI-Assisted Migration Deliver?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;What this approach enables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reduce specialized knowledge dependency&lt;/strong&gt; - DevOps tasks no longer require memorizing cloud provider CLIs, DNS record formats, or deployment configurations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lower operational risk&lt;/strong&gt; - Programmatic validation means migrations happen with confidence, not guesswork&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster iteration&lt;/strong&gt; - Preview deployments enable stakeholder review before production release&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost efficiency&lt;/strong&gt; - Reduced deployment time by 88% (5-8min → 38sec), freeing developer time for feature work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Who benefits:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small businesses without dedicated DevOps teams&lt;/li&gt;
&lt;li&gt;Technical leaders managing infrastructure migrations&lt;/li&gt;
&lt;li&gt;Teams wanting to reduce deployment anxiety and increase velocity&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;key-lessons&quot;&gt;Key Lessons&lt;/h2&gt;
&lt;h3 id=&quot;1-the-ai-stack-handles-implementation-details&quot;&gt;1. The AI Stack Handles Implementation Details&lt;/h3&gt;
&lt;p&gt;You still need to understand what you’re migrating, but you don’t need to remember exact API syntax, AWS CLI flags for Route53 operations, Cloudflare API endpoints, DNS record format specifics, or YAML workflow syntax.&lt;/p&gt;
&lt;p&gt;Claude discovered our infrastructure (Route53) and analyzed the records. Goose orchestrated the execution with tool access. We provided the goals and constraints, reviewed the approach, and approved changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Value:&lt;/strong&gt; Reduces specialized knowledge requirement, eliminates manual typos, compresses migration timeline from 4-6 hours to 2 hours. &lt;a href=&quot;https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai&quot;&gt;McKinsey research shows developers complete tasks up to 2x faster with AI assistance&lt;/a&gt; (2023).&lt;/p&gt;
&lt;h3 id=&quot;2-pre-validation-eliminates-risk&quot;&gt;2. Pre-Validation Eliminates Risk&lt;/h3&gt;
&lt;p&gt;All DNS records were tested against Cloudflare’s nameservers before switching. This included email servers, Google Workspace records, and SSL validation. The process queried Cloudflare nameservers for each record type, verified all 5 MX records, verified TXT records (SPF, DKIM, DMARC), verified CNAME records (Google Workspace services), and generated a validation report.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Business outcome:&lt;/strong&gt; We knew email, Google Workspace, and website would work before changing nameservers. Zero guessing.&lt;/p&gt;
&lt;h3 id=&quot;3-automate-record-migration&quot;&gt;3. Automate Record Migration&lt;/h3&gt;
&lt;p&gt;20+ DNS records, each with specific formats, priorities, TTLs. Manual copying guarantees typos. APIs provided accuracy: export from Route53 (AWS CLI), import to Cloudflare (API), and programmatic comparison to verify all matched.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Zero typos. Zero manual record editing.&lt;/p&gt;
&lt;h3 id=&quot;4-preview-deployments-change-everything&quot;&gt;4. Preview Deployments Change Everything&lt;/h3&gt;
&lt;p&gt;Preview deployments reduce deployment anxiety, catch issues early, enable stakeholder review, and enable faster iteration. For technical leaders, preview deployments shift risk from production to staging, enabling confident releases.&lt;/p&gt;
&lt;h3 id=&quot;5-the-paradigm-shift-from-careful-planning-to-confident-execution&quot;&gt;5. The Paradigm Shift: From Careful Planning to Confident Execution&lt;/h3&gt;
&lt;p&gt;Traditional DNS migrations rely on weekend deployment windows (lower risk, higher stress), manual command execution (careful, but one typo equals disaster), sequential testing after switching (discover errors in production), specialized knowledge (dig syntax, DNS formats, cloud CLIs), and extensive planning with checklists (mitigates risk but time-intensive).&lt;/p&gt;
&lt;p&gt;AI-assisted migrations enable business hours execution (confidence through pre-validation), programmatic execution (eliminates manual typos), pre-validated testing (know it works before switching), offloaded domain expertise (AI handles implementation syntax), and less planning overhead (validation happens automatically).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The transformation:&lt;/strong&gt; From “plan exhaustively and execute carefully” to “validate programmatically and execute confidently.”&lt;/p&gt;
&lt;p&gt;We knew every record worked before switching. No deployment anxiety, no weekend stress, no contingency planning. Just confidence through programmatic validation.&lt;/p&gt;
&lt;h2 id=&quot;when-does-this-approach-apply&quot;&gt;When Does This Approach Apply?&lt;/h2&gt;
&lt;p&gt;This approach works best for infrastructure migrations (DNS, hosting, CI/CD platforms) where teams lack specialized DevOps resources but need zero-downtime execution and audit trails. Requirements include AI assistant with CLI/API access (Goose or similar), API access to source and target platforms, clear migration constraints, and human review processes.&lt;/p&gt;
&lt;p&gt;The trade-offs: reviewing AI decisions takes time, complex migrations may need human judgment on priorities, and initial setup requires configuring API tokens. Not suitable for instant-execution scenarios, environments prohibiting API access, or situations where teams lack domain knowledge to evaluate AI proposals.&lt;/p&gt;
&lt;h2 id=&quot;what-is-the-roi-of-ai-assisted-infrastructure-migration&quot;&gt;What Is the ROI of AI-Assisted Infrastructure Migration?&lt;/h2&gt;
&lt;p&gt;Time savings compound quickly. Deployment speed improved 88% (5-8min to 38sec), saving ~7 minutes per deploy. At 5 deploys per day, that’s 35 minutes daily or 213 hours yearly of developer time recovered. Migration execution took 2 hours versus typical 2-3 day weekend projects.&lt;/p&gt;
&lt;p&gt;Risk avoidance delivers additional value. Zero-downtime migrations eliminate revenue loss windows. Pre-validation prevents email outages (typical cost: hours of missed customer communications). Preview deployments catch production issues before customer impact.&lt;/p&gt;
&lt;p&gt;Platform economics favor Cloudflare. The free tier (500 builds/month, unlimited bandwidth) serves most businesses. High-traffic sites may need paid plans ($20-$200/month), but deployment speed gains alone justify the cost through developer productivity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The real ROI:&lt;/strong&gt; Developer time back for feature work, not infrastructure babysitting.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;BlueCat Networks, “What causes a DNS outage? Humans, mostly” (2024) — &lt;a href=&quot;https://bluecatnetworks.com/blog/what-causes-a-dns-outage-humans-mostly/&quot;&gt;https://bluecatnetworks.com/blog/what-causes-a-dns-outage-humans-mostly/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Cloudflare, “Change your nameservers (Full setup)” — &lt;a href=&quot;https://developers.cloudflare.com/dns/zone-setups/full-setup/setup/&quot;&gt;https://developers.cloudflare.com/dns/zone-setups/full-setup/setup/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;DNSPerf, “DNS Performance Benchmarks” — &lt;a href=&quot;https://www.dnsperf.com/&quot;&gt;https://www.dnsperf.com/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;DORA, “Accelerate State of DevOps Report” (2024) — &lt;a href=&quot;https://dora.dev/research/2024/dora-report/&quot;&gt;https://dora.dev/research/2024/dora-report/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;McKinsey, “Unleashing Developer Productivity with Generative AI” (2023) — &lt;a href=&quot;https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai&quot;&gt;https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Splunk and Oxford Economics, “Downtime Costs Global 2000 Companies $400B Annually” (2024) — &lt;a href=&quot;https://www.splunk.com/en_us/newsroom/press-releases/2024/conf24-splunk-report-shows-downtime-costs-global-2000-companies-400-billion-annually.html&quot;&gt;https://www.splunk.com/en_us/newsroom/press-releases/2024/conf24-splunk-report-shows-downtime-costs-global-2000-companies-400-billion-annually.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Uptime Institute, “Annual Outage Analysis Report 2025” (2025) — &lt;a href=&quot;https://uptimeinstitute.com/about-ui/press-releases/uptime-announces-annual-outage-analysis-report-2025&quot;&gt;https://uptimeinstitute.com/about-ui/press-releases/uptime-announces-annual-outage-analysis-report-2025&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>automation</category><category>devops</category><category>infrastructure</category><category>goose</category><category>case-study</category><author>Hugues Clouâtre</author></item></channel></rss>