AI at the Wheel: An LLM Agent Ran a Full Cloud Intrusion in Under an Hour

We have been talking about AI-assisted attacks in the abstract for two years. Sysdig’s Threat Research Team (TRT) just put a real one on the table. In an intrusion observed on May 10, 2026 and detailed in a writeup published this week, the post-exploitation phase was not run by a human at a keyboard or a pre-written script — it was driven, step by step, by a large language model agent making decisions on the fly. It is one of the first publicly documented cases of an LLM agent operating an entire post-compromise chain in the wild.

What Happened

The entry point was mundane: an internet-reachable marimo notebook vulnerable to CVE-2026-39987, a critical pre-authentication remote code execution flaw affecting all marimo versions up to and including 0.20.4 (fixed in 0.23.0). What happened after the foothold is the story. From initial code execution to a full dump of an internal PostgreSQL database, the whole intrusion ran end-to-end in a little over an hour, across four distinct pivots.

The Four Pivots

According to Sysdig, the agent:

Compromised the internet-facing marimo notebook via CVE-2026-39987 and gained command execution on the host.
Harvested two sets of cloud credentials from the compromised host.
Replayed those credentials through a fanned-out egress pool — traffic spread across multiple source IPs — to reach AWS Secrets Manager and pull an SSH private key.
Used that key to open eight short SSH sessions against a downstream SSH bastion, then dumped the schema and full contents of an internal PostgreSQL database in under two minutes.

Four hops, from a single exposed notebook to the crown-jewel database, with cloud secrets and a bastion host in between. That is a textbook lateral-movement chain — the unusual part is what was steering it.

How They Know an Agent Drove It

Sysdig laid out several tells that separate this from a human operator or a static toolkit. The agent improvised a database dump with no prior schema knowledge, enumerating tables and then immediately going after a credentials table that did not actually exist in the application the schema resembled — a confident wrong guess, followed by adaptation. A planning comment in Chinese, roughly “see what else we can do,” appeared inline in the command stream — the kind of natural-language reasoning artifact a model leaks, not something a human pastes mid-attack. Commands were dispatched across six IPs at sub-second pace, faster than anyone is typing. And every command was shaped for machine parsing: structured separators, bounded output limits, and discarded error streams, so the agent could cleanly read each result before choosing the next move.

Why This Should Worry Infrastructure Teams

The uncomfortable takeaway is about detection, not this one CVE. Signature- and IoC-based defenses lean on repetition: a known User-Agent, a fixed command order, a reused tool, predictable timing. An LLM agent breaks every one of those assumptions. It does not reuse patterns across targets, it improvises ordering, and it adapts to whatever it finds. Sysdig’s argument is that what survives this shift is behavioral detection built around what the attacker is trying to accomplish — credential access, secrets-manager reads, mass database egress — rather than how they happen to spell it on any given run.

What To Do Right Now

Patch marimo to 0.23.0 or later, and get marimo notebooks off the public internet entirely — they are interactive code-execution surfaces and have no business being directly reachable. Beyond that, this intrusion is a checklist of controls that would each have broken the chain: scope cloud credentials tightly and rotate anything that touches a notebook host; lock down AWS Secrets Manager with least-privilege IAM and alert on unusual GetSecretValue calls; treat SSH bastions as high-value targets with short-lived certificates instead of long-lived private keys; and monitor for bulk database reads regardless of source. Assume the attacker can improvise — build detections around outcomes (secrets accessed, databases dumped) instead of fragile indicators.

References

Sysdig Threat Research Team: “AI agent at the wheel: how an attacker used LLMs to move from a CVE to an internal database in 4 pivots”
The Hacker News: “Attackers Use LLM Agent for Post-Exploitation After Marimo CVE-2026-39987 Exploit”