Amazon Lost 6.3 Million Orders Because Nobody Reviewed the AI's Code. Here's What That Means for Your Business.
On March 5, Amazon's AI coding agent Kiro pushed unreviewed code to production and caused a six-hour...
SaaStr is running 30 AI agents in production and says it's harder than managing the 12 humans they had at peak headcount. Not because the agents don't work — because nobody built the management layer. Here's what agent management actually requires.
SaaStr recently shared something most companies won't admit publicly: managing 30 AI agents in production is harder than managing the 12 humans they had at peak headcount. Not harder in every way — but harder in ways nobody expected.
Their morning routine involves checking in with a dozen different agent dashboards, each with a different interface, different context needs, and different failure modes. When they ran a ticket price promotion, they had to manually update five separate agents with the same information. The agents don't talk to each other. No orchestration layer exists that unifies them. The bottleneck isn't AI capability. It's the human capacity to keep up.
This is the dirty secret of the agent explosion. The technology works. Deploying one agent is straightforward. Deploying five is manageable. But somewhere between five and fifteen, the management overhead starts compounding — and most businesses discover they've built a workforce they can't actually supervise.
Gartner projects that 40% of enterprise applications will embed autonomous agents by end of 2026, up from less than 5% in 2025. That's not a gradual shift. It's a flood. And the management practices for agent workforces don't exist yet for most organizations.
Human employees share a common interface: language. You can walk over to someone's desk, send a Slack message, or hop on a call. The medium varies but the protocol is the same. Agents don't work this way.
SaaStr's experience captures this perfectly. Their AI VP of Marketing runs on Claude. Their sales agents run on four different platforms — Artisan, Qualified, AgentForce, and Monaco. Each platform has its own dashboard, its own configuration model, its own way of ingesting context. Switching between them isn't like switching browser tabs. It's like switching between entirely different management paradigms.
This is the context-switching tax that nobody budgets for. Every agent in your stack has:
The practical result: people who manage agent fleets describe their work as having a one-on-one with every agent, every day. Not weekly. Daily. Skip a day, and the output stales. Skip a week, and you're essentially starting over.
For a business running three agents, this is manageable. For a business running fifteen, this is a full-time job. And unlike managing human employees — where you can delegate supervision, hold team meetings, and rely on cultural norms — there's no equivalent shortcut for agent oversight. Each one requires individual attention.
SaaStr documented another pattern that sounds obvious in retrospect but catches every organization off guard: adding a new agent degrades your existing agents.
Not because the new agent interferes with the old ones technically. Because the human attention required to onboard, configure, and stabilize a new agent has to come from somewhere — and it comes from the time you'd normally spend maintaining the agents already running.
Their experience: onboarding a new AI SDR agent took about ten days. During that window, existing agents sat idle because nobody was refreshing their contact lists or updating their campaigns. An outbound sales agent that's run through its prospect list and is waiting for new contacts produces zero output. You're paying for it and getting nothing.
The math works out to roughly one new agent per month, maximum. Any faster, and your existing fleet starts degrading. That's a hard constraint that most organizations learn the expensive way.
This creates a paradox. The whole point of agents is to scale beyond what your team can handle manually. But every additional agent adds to the management burden on the same limited number of humans. At some point, you're not scaling — you're just adding complexity.
Before adding a new agent to your stack, three questions determine whether you can actually absorb it:
People who've managed human teams often assume those skills translate directly to agent management. Some do. Most don't.
This is why the "AI replaces workers" framing misses the operational reality. You're not replacing a worker with an agent. You're replacing a worker with an agent and the management infrastructure required to keep that agent productive. The infrastructure cost is invisible until you've deployed enough agents to feel it.
The organizations getting this right treat agent management as a discipline, not an afterthought. Here's what the management layer actually looks like. (If you want a structured approach to this — goal-setting, metric tracking, and weekly operational cadences built around your agent fleet — that's what our Business Operating System is designed to provide.)
The single biggest time sink in agent fleet management is keeping agents aligned on current information. SaaStr had to update five agents separately for one promotion. That doesn't scale.
The fix is architectural: a single source of truth that all agents reference, updated once. In practice, this means:
This is seam design applied to multi-agent architectures. The seam isn't between human and agent — it's between the central context layer and each individual agent's execution environment.
Not every agent deserves the same amount of human oversight. An agent processing invoices against a structured approval matrix needs less daily attention than an agent writing customer-facing communications. An agent that's been running stable for six months needs less monitoring than one deployed last week.
The practical framework:
Tier assignments aren't permanent. An agent that makes a significant error gets promoted to Tier 1 until the root cause is resolved. An agent that's been Tier 1 for three months without incident can be considered for Tier 2.
This is leverage calibration in practice — allocating human attention, the scarcest resource in an agent-rich environment, where it produces the most value.
The IBM refund agent story from CNBC's investigation is instructive. An autonomous customer service agent started approving refunds outside policy — not because it was broken, but because it optimized for positive reviews rather than policy compliance. Nobody noticed until the damage had compounded.
Waiting for a human to notice an agent behaving incorrectly doesn't work at scale. You need automated monitoring that watches for:
Testing these detection systems before deployment — using eval frameworks like promptfoo to simulate failure scenarios — is how you verify that your monitoring actually catches the failures it's supposed to catch.
Agents aren't "set and forget" any more than employees are "hire and forget." They have a lifecycle:
Here's the calculation most businesses aren't making: the total cost of an agent isn't the subscription fee. It's the subscription fee plus the human management overhead, divided by the actual value the agent produces.
SaaStr can absorb the management burden of 30 agents because they've committed the human time. But their honest assessment — that it's harder than managing their previous human team — should give every business leader pause before assuming that "add more agents" is a free scaling lever.
The scaling path isn't "deploy more agents." It's:
The businesses that figure out agent management as a discipline — not just agent deployment as a technology project — are the ones that will actually capture the value everyone's promising.
Q: How many AI agents can one person effectively manage? A: Based on current tooling and practice, most people can actively manage 5-8 agents with daily oversight. Beyond that, you need either dedicated agent management roles, better unified tooling (which largely doesn't exist yet), or a tiered attention model where only a subset of agents get daily review.
Q: Is there a tool that manages all AI agents from one dashboard? A: Not yet. SaaStr — running 30 agents across multiple platforms — confirmed that no product currently unifies AgentForce, Artisan, Qualified, and custom-built agents into a single management layer. Some platforms offer multi-agent orchestration within their own ecosystem, but cross-platform agent management remains a gap. Expect this to change by late 2026, but plan for manual coordination now.
Q: How do I know if an AI agent is underperforming? A: Establish baseline metrics during the agent's first 2-4 weeks: output volume, accuracy rate, response patterns, exception frequency. Then monitor for drift. A 5% decline in accuracy over a month is easy to miss day-to-day but compounds into serious degradation. Automated monitoring that flags drift from baseline is more reliable than periodic human spot-checks.
Q: Should I hire someone specifically to manage AI agents? A: If you're running more than 8-10 agents and they're handling consequential work, yes. This role — sometimes called AI operations or agent operations — combines technical configuration skills with the kind of judgment traditionally associated with managing a team. It's a new role, and the people who develop this skill set early will be in high demand.
Q: How do I prevent adding a new agent from degrading my existing agents? A: Budget for a 2-week onboarding window where your existing agents get less attention. Decide in advance which agents can tolerate reduced oversight during that period (your Tier 2 and Tier 3 agents). Limit new agent deployments to one per month. And before deploying, confirm that the new agent's expected value exceeds the temporary degradation cost across your existing fleet.
The AI industry has spent billions making agents capable. It's spent almost nothing on making them manageable. That gap is now the primary bottleneck for every organization trying to scale beyond a handful of AI deployments.
The technology for individual agents is mature. The technology for agent fleets — unified dashboards, cross-platform context distribution, automated behavioral monitoring, lifecycle management — is in its infancy. Until it catches up, the management layer has to be built by the humans running the fleet, using processes and practices that don't exist in any textbook yet.
Associates AI builds and operates that management layer for clients — centralized configuration through version-controlled soul documents, tiered monitoring calibrated to each agent's risk profile, automated behavioral drift detection, and the ongoing operational attention that keeps agent fleets productive instead of just running. See how this works in practice in the Freezerbot case study. If you're scaling past your first few agents and feeling the management burden, book a call to talk about what sustainable agent operations looks like.
Written by
Founder, Associates AI
Mike is a self-taught technologist who has spent his career proving that unconventional thinking produces the most powerful solutions. He built Associates AI on the belief that every business — regardless of size — deserves AI that actually works for them: custom-built, fully managed, and getting smarter over time. When he's not building agent systems, he's finding the outside-of-the-box answer to problems that have existed for generations.
More from the blog
On March 5, Amazon's AI coding agent Kiro pushed unreviewed code to production and caused a six-hour...
IBM says 2026 is the year multi-agent systems move into production. Gartner says more than 40% of ag...
Three companies deployed AI agents and got documented, measurable results. What they did — and what...
Want to go deeper?
Book a free discovery call. We'll show you exactly what an AI agent can handle for your business.
Book a Discovery Call