AI flywheels: what happens when workflows run themselves

🕒 Published on Zendoric: July 5, 2026 · 04:36
The article is the sixth episode of Turing Post's "The Org Age of AI" series, authored by Will Schenk and Ksenia Se, and introduces the concept of the "AI flywheel" as the next logical step after the workflows described in the previous episode of the series.
We'll send you a confirmation email (double opt-in). Privacy.
The article is the sixth episode of Turing Post's "The Org Age of AI" series, by Will Schenk and Ksenia Se, and it introduces the concept of the "AI flywheel" as the next logical step after the workflows described in the previous episode of the series. It should be noted that the downloaded text corresponds only to the first half of the article: upon reaching the section 'Why this reaches you on a schedule you don't control', the content is blocked behind a paywall ('UPGRADE TO READ THE REST'), so much of the final argument —including the detail on the types of infrastructure that are missing, how to prioritize which loops to close first, and the discussion of the 'new gap' in verification at machine speed— is not available in the material received. The summary that follows is based solely on the accessible part.
The piece builds a conceptual ladder of three rungs. At the base is the 'pipeline': a fixed sequence of steps, mechanical, with no real human judgment, like a cron job or a script. One rung up is the 'workflow': a repeated sequence of decisions and actions where, at some points, a human exercises judgment; if that judgment is removed, the workflow collapses back into a pipeline. The authors recall that, according to the previous episode of the series, as a workflow matures, the human migrates from the center toward the edges: defining the parameters at the start and reviewing the exceptions at the end, while the 'center' of the process is left to the agent.
The third rung, and the central theme of this episode, is the 'flywheel': a collection of workflows chained together so that the output of one feeds the next, oriented toward a goal defined only once, spinning continuously without human intervention between iterations. A 'closed loop' would be the smallest possible flywheel: a single workflow that feeds back into itself. The key difference between a pipeline and a flywheel is that the pipeline repeats, while the flywheel steers, and what it steers is measurement: the system acts, measures the result of its own action, and uses that measurement to decide the next action. That is why a flywheel has three beats —generate, measure, decide what to try next— and not just one.
A central point of the article is the distinction between two ways of closing that loop: the wrong way, which consists of removing human control and trusting (hoping it works), typical of many 'we deployed autonomous agents' stories that end badly; and the right way, which consists of replacing the human checkpoint with an automatic verifier: a test suite, a schema validation, a reconciliation against known totals, or a performance metric. In this second case, human judgment does not disappear, but is encoded once inside the verifier, instead of being exercised manually on each run. The authors insist that loops are not closed because someone decides to trust the model, but because verification has become cheap, fast and objective; in all other cases, the human stays in the process. And they point out something important: the flywheel does not eliminate the human, but moves them up one more level —no longer in the middle of the work nor coordinating between workflows, but positioned over the verifier itself.
To illustrate that these flywheels already exist today, the article offers two concrete examples. The first is agent-assisted programming: a modern coding agent not only writes code, but runs a complete experiment —writes, runs the tests, reads the failures, rewrites, runs the tests again— without anyone reviewing the intermediate iterations; the human only reviews the final diff, and in low-risk cases, not even that. It is cited that Anthropic claims most of its own code is already written by Claude Code, and that OpenAI reported in February that GPT-5.3-Codex was key to building itself, debugging its own training runs and analyzing its own evaluations. The second example is a hypothetical but plausible 'ad optimization flywheel': one workflow generates the creative (headline, copy, image), another pulls the performance from the ad console (impressions, clicks, conversions), and a third decides the next experiment (kill the loser, scale the winner, try a new angle), all running continuously with no human between iterations, because the ad console acts as an objective verifier.
The article explains why programming was the domain where this cycle closed first: over forty years, software accumulated verification infrastructure —compilers, type systems, test suites, continuous integration (CI)— that already encoded 'what correct looks like' in executable form before LLMs arrived. Advertising has a weaker version of the same gift: objective, if noisy, performance figures. It revisits the 'Factory AI principle' from the previous episode: the ease of training an agent on a task is proportional to how verifiable that task is; that is why the loop closed first where the work was most verifiable.
The authors carry this logic over to any organization, posing an uncomfortable question: which of your workflows have a test suite? which have anything like one? For most companies, the honest answer is that their workflows have humans: the human is the verification layer and also the coordination layer, the one who decides which workflow runs next and whether the overall effort is working. This implies that the human is, precisely, the reason the flywheel cannot spin yet in most organizations: if you simply remove them, an infrastructure debt that no one had sized reveals itself.
The final visible part of the article is devoted to the 'biggest loop of all': AI research itself. It mentions the 'The AI Scientist' project, reported in Nature in March, which automates the research cycle end to end (generates ideas, runs experiments, writes up results and reviews its own papers). It also cites a startup called Recursive, which this week published results from a system that proposes a research idea, implements it, runs the experiment, validates the result and uses what it learned to choose the next experiment, running multiple threads over long horizons, with explicit mechanisms to detect 'reward hacking' before accepting a gain as valid. It mentions an Anthropic article titled 'When AI builds itself', published this month, which claims that a growing portion of the company's AI development is already delegated to AI systems, and that, taken to the extreme, the trend points to systems that design their own successors. It cites Jack Clark, who reportedly estimated a roughly 60% probability that, by the end of 2028, a system will exist capable of training a more powerful successor without human intervention. It also references an article by Dean Ball titled 'On Recursive Self-Improvement', from February, which argues that frontier labs are automating large fractions of their research operations, and that their effective headcount of agents would grow from thousands to hundreds of thousands within one or two years.
The authors themselves introduce notes of caution regarding these claims from the labs, pointing out that the labs have incentives to describe their own momentum in the most forceful terms possible, and that their forecasts might not come true on the stated timelines. However, they argue that the organizational thesis holds up just as well without assuming the arrival of a superintelligence: it is enough to rely on what is already happening —work loops closing in domains where verification is strong— combined with the observation that technological capability diffuses and travels between domains.
The text cuts off just as it promises to address 'why this reaches you on a schedule you don't control', leaving undeveloped in the available material the sections on the review bottleneck in closed workflows, the three types of infrastructure that do not yet exist, the two directions in which the flywheel can spin (for good or for ill), which loops should be closed first, and the new gap between organizations according to their capacity for verification at machine speed. Nonetheless, the article's own final FAQ offers hints about these topics: it identifies the main risk of a flywheel as compounding error and reward hacking (since each run feeds the next, small errors propagate instead of remaining isolated, and the system may optimize a measured metric while losing sight of the original intent), and suggests as defenses regression evaluations, canary checks, drift detection, and always keeping a human who reviews the verifier instead of reviewing each individual output. It also hints, without developing it, that the workflows best suited to closing the loop first would be those of the synchronization-and-transformation, triage and monitoring patterns, while work based on 'good taste' or subjective judgment —such as writing outward-facing content— should be closed last, if it is closed at all.
Overall, the available fragment offers a clear and well-argued conceptual framework —pipeline, workflow, flywheel— and a powerful central thesis: the key to safe autonomy is not to eliminate the human, but to move their judgment into a robust verifier, and the strength of that verifier determines which tasks can be automated in a closed loop. For those who follow Manuel's newsletter, the immediate practical value is the question it poses to any organization: which workflows already have a verifier equivalent to a test suite, and which still depend entirely on human judgment as the only layer of control? That question, more than the figures about AI labs, seems to be the real actionable core of the article, though to learn the authors' concrete recommendations on infrastructure and prioritization one would have to access the full paid version.
Sources & references
Get the analysis by email · free
One email a day analysing the AI essentials. Free, no spam, unsubscribe anytime.
We'll send you a confirmation email (double opt-in). Privacy.