Your AI assistant reads documents your users never could

66% of enterprises have caught their AI systems accessing sensitive data they should not touch. Only 11% could block it automatically.

Those numbers come from Cyera’s 2025 State of AI Data Security report, a survey of 921 IT and security practitioners run with Cybersecurity Insiders. Priyanka Neelakrishnan pulled the thread on why in a recent episode of her show, and her diagnosis matches what I see in the field: this is not a tuning problem or a prompt problem. It is an architecture problem. The retrieval layer in most RAG deployments has no idea who is asking.

I have spent twenty years in enterprise identity and access, and I build retrieval-backed AI systems now, including a pgvector-backed memory store that sits behind my own agents. So I have watched this gap from both sides. The identity side spent two decades getting permissions right. The AI side threw them away at ingestion and did not notice.

Where the permissions die

Walk through what a standard RAG pipeline actually does to a document.

Start with a file in SharePoint, Confluence, Google Drive, or a wiki. That file has an ACL. Someone thought about who should read it. HR compensation bands are scoped to HR. The M&A folder is scoped to the deal team. Your identity governance program, if you have one, certifies those entitlements on a schedule.

Now the AI team connects a RAG pipeline to it. The pipeline reads the file with a service account, splits it into chunks, runs each chunk through an embedding model, and writes the vectors into a vector database. Pinecone, Weaviate, pgvector, OpenSearch, take your pick.

Ask yourself: at which step did the ACL come along?

It did not. The chunk kept the text. The vector kept the meaning. Nobody kept the permission. The document’s access control lived in the source system’s head, and the copy that matters at query time has never heard of it.

So when a user asks the assistant a question, the retriever does exactly what it was built to do: nearest-neighbor search over everything in the index. The compensation bands and the M&A memo are just vectors near a query vector. The retriever returns them, the LLM summarizes them, and the assistant hands a junior analyst a fluent answer sourced from documents they could never have opened directly.

No exploit. No jailbreak. The system worked as designed. The design was the flaw.

The way this usually plays out: the pilot indexes a curated folder of twenty product docs and looks brilliant. Then production connects the whole tenant, because the whole point is that the assistant knows everything. The service account that made the pilot fast becomes the permanent identity of the pipeline. Six months later someone in a demo asks a question that pulls a board deck into the answer, and the security team finds out the index exists.

The confused deputy is back

Identity people have a name for this failure and it is older than most of the engineers shipping RAG pipelines: the confused deputy. A privileged intermediary performs an action on behalf of a less-privileged requester, using its own authority instead of the requester’s.

The RAG assistant is a textbook confused deputy. The ingestion service account was granted read access to everything, because that was the fastest way to make the demo impressive. The assistant answers every user with that same god-scoped identity. The user’s own entitlements never enter the transaction.

Cyera’s data says this is the norm, not the exception: 21% of organizations grant AI broad data access by default, and only 16% treat AI as a distinct identity class with its own policies. Just 13% claim strong visibility into how AI touches enterprise data at all.

Put those together and the 66% over-access figure stops being surprising. Most shops wired an over-privileged non-human identity to their most sensitive data stores, gave everyone in the company a natural-language interface to it, and kept no per-request identity context. Of course it over-accesses. It was never told not to.

The vector database is a data store, not an index

There is a second, quieter problem underneath the first one: teams treat the vector database as derived data, like a search index they could rebuild anytime, so it escapes the data classification and controls the source systems get.

Embeddings are not an anonymized abstraction. Published research on embedding inversion has shown that text can be reconstructed from its embeddings with high fidelity; the vec2text work out of Cornell recovered 92% of 32-token inputs exactly. If the source document was confidential, the vectors are confidential. Same data, different encoding.

That has two practical consequences. First, the vector store belongs in your data inventory with the same classification as the most sensitive document inside it. Second, access to query the store, even without going through the assistant, is access to the data. If your pgvector instance or Pinecone namespace is reachable with a shared API key that has never rotated, you have rebuilt the flat network of 2005, in vector form.

What does not fix it

Three fixes get reached for first, and none of them holds.

System-prompt guardrails. “Do not reveal confidential information” is a suggestion, not a control. The model cannot enforce a permission it has no data about, and prompt injection walks right past it. If the restricted chunk reaches the context window, the control has already failed.

Asking the LLM to filter. Some teams pass the user’s role in the prompt and ask the model to withhold what that role should not see. Now your access control decision is being made probabilistically, per token, by a component with no audit trail. No auditor will accept that, and no attacker will struggle with it.

Post-generation redaction. DLP-style scanning of the output catches known patterns, credit cards, SSNs. It does not catch “summarize the acquisition memo,” because the leak is the meaning, not a string.

The common thread: all three try to bolt the control on after retrieval. The retrieval was the breach.

What actually fixes it

The fix is to put identity back into the retrieval layer, which means treating retrieval as an authorization decision. Concretely, starting with the step that carries the most weight:

1. Carry the ACL through ingestion. Every chunk gets metadata at ingestion time: source system, document ID, and the permission descriptor (groups, roles, sensitivity label). This is the load-bearing step. If the permission is not in the store, nothing downstream can enforce it.

2. Filter at query time with the user’s identity, not after. The user’s token comes in, resolves to groups and attributes, and becomes a metadata filter on the vector search itself. Restricted chunks are not retrieved and ranked lower; they are never candidates. Most serious vector stores support metadata filtering during search. Use it as a security boundary, not a convenience feature.

3. Rerank with attribute-based access control for the hard cases. Group membership covers the coarse cut. The finer decisions, region, deal involvement, data residency, employment status of the person named in the document, are attribute problems. An ABAC-aware reranking or post-filter pass evaluates policy against the user’s attributes and the chunk’s attributes before anything enters the context window. Neelakrishnan’s analysis lands on the same point: retrieval needs to be permission-aware, not permission-hopeful.

4. Keep permissions synced. ACLs change. People leave the deal team. If your index carries yesterday’s permissions, you have a time-of-check problem. Incremental permission sync from the source systems is unglamorous and essential, and it is exactly the kind of joiner-mover-leaver plumbing IGA teams have run for years.

5. Watch the vector store like a crown-jewel database. Log retrieval events with the requesting identity, monitor for inversion-shaped access patterns (high-volume, systematic nearest-neighbor sweeps that look like extraction rather than Q&A), and rotate the store’s own credentials. Only 9% of organizations in Cyera’s survey monitor AI activity in real time. The retrieval log is where over-access becomes visible; without it you are in the 87% who cannot see the problem they almost certainly have.

6. Run the pipeline itself through the non-human identity lifecycle. The ingestion service account, the assistant’s runtime identity, the store’s credentials: provision them deliberately, scope them minimally, rotate them, audit them, kill them when the project dies. I wrote up that five-gate lifecycle in Your AI agents are privileged identities you forgot to manage; the RAG pipeline is just the newest identity that needs it.

None of this is exotic. It is the same discipline your enterprise already applies to database access, applied to a new data path that has been politely excused from it.

One tempting shortcut deserves a warning: segregating by collection. One index for HR, one for engineering, route users by department. It feels like access control and it does cut the worst exposure, so it is a fine first move. But it is RBAC at its coarsest, it breaks the cross-domain questions that made the assistant useful, and it says nothing about the document that is sensitive within a department. Treat it as triage, not the destination.

The Copilot lesson: permission-aware is necessary, not sufficient

Microsoft 365 Copilot is the instructive counterexample, because Microsoft did the retrieval part right. Copilot queries through the Graph with the user’s own identity and honors existing file permissions. Textbook security trimming.

And its enterprise rollouts still turned into data-exposure stories. Not because the trimming failed, but because it worked: Copilot faithfully enforced permissions that were wrong. Every “shared with everyone” site from 2019, every default-open library, every group nobody recertified became instantly discoverable through a chat box. The oversharing was always there; before Copilot, finding it required knowing where to look. After, it required typing a question.

That is the second half of the work. Identity-aware retrieval enforces your entitlements as they are. If your entitlements are a mess, the assistant becomes a very efficient tour guide to the mess. Which is why the RAG problem is an IGA problem in disguise: access reviews, recertification, and least-privilege cleanup on the source systems are now prerequisites for shipping an AI assistant safely, not a compliance chore running on its own calendar.

We already solved this once

Here is the part that makes me sigh. Enterprise search solved this problem fifteen-plus years ago and gave it a boring name: security trimming.

When SharePoint or the Google Search Appliance indexed your file shares, results were trimmed to the requesting user’s permissions at query time. Every serious enterprise search product treated that as table stakes, because the first pilot user who saw an HR document in their results would have killed the project. The pattern was proven: carry ACLs into the index, resolve the user’s identity per query, filter before ranking.

RAG rebuilt enterprise search with an LLM on top and skipped the lesson. Partly because the teams shipping it come from the ML side and never lived through the file-share era. Partly because demos reward recall, and security trimming only ever removes results. Nobody wins a hackathon by retrieving less.

But the requirement did not go away because the technology changed. It got stricter. Old search showed the user a document title they could click and get denied on. The LLM reads the document to them.

The uncomfortable question

If you have a RAG assistant in production, or a Copilot-style product wired into your document stores, the question to ask this week is simple: when the retriever runs, whose identity is it using?

If the answer is “the pipeline’s service account” and there is no per-query permission filter behind it, you are in the 66% whether you have caught it yet or not. The catching is the part that requires the retrieval log.

The good news is that this is a solved class of problem. Identity-aware retrieval is security trimming with embeddings, the confused deputy has been in the literature since 1988, and the joiner-mover-leaver sync your IGA team already runs is the same muscle. The AI stack did not create a new discipline. It created a new place where the old discipline is missing.

What would your assistant answer right now if the newest hire asked it about executive compensation?

Sources

Cyera Research Labs, 2025 State of AI Data Security Report, with Cybersecurity Insiders, 921 respondents (2025-09-29): cyera.com
Priyanka Neelakrishnan, RAG access control gap analysis, episode 21 (2026-07-04): youtube.com
Morris et al., “Text Embeddings Reveal (Almost) As Much As Text,” vec2text, Cornell (2023-10): arxiv.org