A critical heap out-of-bounds read in Ollama’s GGUF model loader, tracked as CVE-2026-7482 and nicknamed Bleeding Llama by Cyera Research, lets an unauthenticated attacker with network access to the Ollama API exfiltrate the server’s process memory. Disclosed May 5 and picked up across the security press over the past 24 hours after CERT/CC published VU#518910 and runZero, SecurityWeek, and The Hacker News expanded coverage, the bug is rated CVSS 9.1 and affects every Ollama version prior to 0.17.1. Internet-scan data cited by Cyera and CSO Online puts the exposed surface at more than 300,000 servers globally — the bulk of them running default configurations with no authentication in front of the API.

What goes wrong

Ollama’s GGUF parser trusts the tensor metadata embedded in uploaded model files. In fs/ggml/gguf.go and server/quantization.go, the WriteTo() path uses the declared tensor.offset and tensor.size fields to slice into a heap buffer that’s sized to the actual file. There’s no check that offset + size fits inside the file. When an attacker uploads a forged GGUF whose header advertises a tensor much larger than the real payload, the quantizer happily reads past the buffer and into whatever follows it on the Go heap.

The exploit chain is short and entirely remote:

  1. POST /api/create with a crafted GGUF that declares an oversized tensor. No authentication is required on a default install.
  2. Ollama runs the quantization step. The OOB read copies adjacent heap memory into the output tensor.
  3. POST /api/push to a registry under the attacker’s control. The “model” they receive is mostly leaked server memory baked into tensor bytes.

Repeat with different offsets/sizes to walk the heap. Because Ollama keeps active conversations, system prompts, and decoded request bodies in the same address space, the bytes that come back are immediately useful.

What an attacker walks away with

CERT/CC’s VU#518910 and Cyera’s writeup both flag the same data classes recovered from real test instances: process environment variables (the most common place teams stash OPENAI_API_KEY, registry tokens, and database URLs), system prompts for hosted models, in-flight user prompts and responses from other tenants of the same server, and any API keys or bearer tokens that touched the process recently. On servers exposed to the public internet — and there are hundreds of thousands of those — none of this requires a user account, a session, or even a model that exists on the box.

Why the blast radius is large

Ollama ships with the API bound to 0.0.0.0 in several common Docker images and self-hosted recipes, and the default has no auth layer. AI platform teams have spent the last 18 months racing to stand up local inference for cost and privacy reasons, and Ollama has been the path of least resistance — which is exactly how 300k internet-exposed instances happened. CSO Online’s coverage emphasizes that this is the same pattern that bit ComfyUI and LM Studio operators earlier this year: AI infra deployed with web-app threat models from 2014.

What to do right now

Upgrade to Ollama 0.17.1 or later. The patch landed via PR #14406 and adds (a) a file-size sanity check in the GGUF loader and (b) a tensor data length check in the quantizer. Either gate alone would have killed this bug; both are now in.

If you can’t upgrade in the next maintenance window:

  • Bind the Ollama API to 127.0.0.1 or a private interface only. There is no good reason to expose /api/create to the public internet.
  • Put an authenticating reverse proxy (Caddy, nginx with basic auth, or an OAuth proxy) in front of the API and block /api/create and /api/push for unauthenticated callers.
  • Rotate every secret that has ever lived in the Ollama process’s environment — OPENAI_API_KEY, HuggingFace tokens, model-registry creds, and any database URL. Assume disclosure on any host that was internet-reachable since early May.
  • Hunt for POST /api/create followed shortly by POST /api/push to non-allowlisted destinations in your access logs.

This is the third Ollama CVE in a week per Mondoo’s tracker — the project is now squarely in the crosshairs, and the next two issues (Windows updater path) are still embargoed. If you run Ollama in production, this is the moment to treat it like any other exposed service: pin a version, watch the advisories, and keep the API off the open internet.

References