After Heppner: Why Legal AI Has to Run on Your Own Machine
A new S.D.N.Y. ruling just told defendants their AI conversations are not privileged. Here is what that means for anyone using cloud AI to work on a case — and how Sound Suite is built so the question never arises.
The Ruling Every Litigator Should Have Read Last Month
On February 17, 2026, Judge Jed S. Rakoff of the Southern District of New York issued a short written opinion in United States v. Heppner, No. 1:25-cr-00503-JSR (S.D.N.Y. 2026), 2026 WL 436479. The Harvard Law Review covered it here. The commentary has been unusually consistent across the defense and civil bars: this is the first federal ruling on whether what you type into a consumer AI product is protected by the attorney-client privilege or the work-product doctrine. The answer, on these facts, was no.
I build Sound Suite, a local-only document intelligence platform aimed at pro se litigants and small practices that cannot absorb the cost of discovery software or a cloud-AI contract with zero-retention terms. Heppner is the case I have been warning our earliest users about for months. It deserves a careful read, and it deserves an honest explanation of which architectural choices actually avoid the trap the defendant walked into, and which ones only look like they do.
What Bradley Heppner Actually Did
The defendant, Bradley Heppner, is a financial-services executive charged with securities fraud in the Southern District of New York. The sequence of events that produced the ruling is the part that matters:
- He received a grand jury subpoena.
- He retained counsel.
- After both of those things, he opened the consumer version of Anthropic's Claude — not ChatGPT, and not an enterprise-tier product with a confidentiality contract — and began a series of conversations about his own legal exposure, drawing on information he had learned from his lawyers.
- Out of those sessions, he produced what the court described as "reports that outlined defense strategy." Commentators have cited a figure of roughly thirty-one generated documents.
- Federal agents later seized his devices in the ordinary course of the criminal investigation. The Claude-generated files came with them.
- The government moved for a pre-trial ruling that the materials were not privileged. Judge Rakoff granted the motion from the bench on February 10 and issued a written opinion a week later.
Two things are worth pausing on. First, the government never had to go to Anthropic. The documents were on the defendant's own hard drive, so the Stored Communications Act, the Fourth Amendment, and the third-party doctrine in its Carpenter sense are not doing any work in this opinion. This is a pure privilege and work-product case. Second, counsel did not direct the AI use. Heppner went to Claude on his own, and only later shared some of the output with his lawyers. That fact pattern is going to matter.
The Rule, in Two Sentences
Judge Rakoff's holding is narrower than the headlines suggest, and sharper. The core of the ruling is this: because Claude is not an attorney, communications with it cannot be attorney-client privileged; and because the consumer product's terms of service permit Anthropic to retain inputs, train on them, and disclose them to "governmental regulatory authorities" under appropriate circumstances, no user has a reasonable expectation of confidentiality in what they type into it. As the Debevoise Data Blog summary captured the opinion's frame, all "recognized privileges" require "a trusting human relationship," and a consumer chatbot with a commercial privacy policy is not that.
On work product, the court was equally compact: the doctrine protects materials prepared by an attorney, or by a party at an attorney's direction, in anticipation of litigation. Heppner's Claude outputs were neither. And even if some of the inputs carried privilege at the moment he typed them, he waived it by sharing them with a third party — Anthropic — the moment the request went out over the wire. That waiver reasoning appears in a footnote but it is the most important piece of the opinion for day-to-day practice.
The Footnote Every Lawyer Will Argue Over
The opinion does not close the door on AI-assisted litigation work. As summarized across firm commentary on the ruling — I have not yet pulled the full slip opinion off PACER to verify the exact wording — Judge Rakoff's hypothetical dicta essentially says that if counsel had directed Heppner to use Claude, the tool might have functioned as a lawyer's agent in a way that could bring it within the protection of the attorney-client privilege. That is the Kovel analogy, drawn from United States v. Kovel, 296 F.2d 918 (2d Cir. 1961), which extended privilege to an accountant retained by counsel to assist with the legal representation. Applied to AI, the opinion is hinting at three conjunctive conditions: (a) counsel directs the use, (b) the platform is not a consumer tier that trains on or can disclose inputs, and (c) the engagement looks, in substance, like a Kovel arrangement. Several firms — Proskauer, Jones Walker, and Gibson Dunn — have read the dicta the same way. None of them say the question is settled.
A Second Court, the Same Day, a Different Answer
Heppner did not drop in a vacuum. On the same day, February 10, 2026, a magistrate judge in the Eastern District of Michigan reached a very different conclusion in a civil discovery dispute, Warner v. Gilbarco, Inc. A pro se plaintiff had used ChatGPT to help prepare filings, and the defense moved to compel production of the chat logs. The magistrate refused. The reasoning, as recounted in secondary coverage of the order, turns on treating ChatGPT and similar tools as tools rather than as persons — even if administrators sit somewhere behind them — and treating AI-reformatted litigation preparation as a litigant's own mental impressions run through software. Under that framing, the output is classic work product.
In the Harvard Law Review post that prompted this write-up, Cindy X. Guo argues that Warner has the better of the analysis. Her critique of Heppner is that Rakoff treated Claude as a non-attorney human rather than as a tool, and that a more context-sensitive reading would recognize that some AI use, especially counsel-directed use on platforms with appropriate contractual protections, should "at least sometimes qualify for privilege." For now, litigants need to assume that the federal district courts are split and that Rakoff's framing — consumer AI use equals third-party disclosure equals waiver — is the one the government will press everywhere it can.
Three Architectures, Three Privilege Postures
Once you strip away the specifics, every legal AI workflow lives in one of three places: a consumer or marketplace-hosted environment that you do not fully control, a managed enterprise API with a contract that narrows retention and training, or a model you run on hardware you own. Each has real advantages and real costs. The honest comparison is the one the three approaches are entitled to.
The consumer or marketplace tier
This is the Heppner case. It also captures most uses of public ChatGPT, the free tier of Claude, and — in a slightly different shape — GPU marketplaces like RunPod's Community Cloud, where the actual host machine is a third party renting GPU time to the marketplace. RunPod is transparent about the split: their Secure Cloud runs in vetted Tier-3/4 data centers they operate, while Community Cloud pods run on peer-operator hardware. Their Terms of Service reserve the right to "disclose any content or records concerning Your account as necessary to satisfy any law, regulation, or other governmental request," and do not specify a post-termination data-deletion timeline on customer volumes. I have not found any publicly reported RunPod customer-data incidents as of April 2026, but the relevant research literature on GPU memory residue — CUDA Leaks, Naghibijouybari et al. (CCS 2018), NVBleed (2025), and Owl (DSN 2024) — establishes that GPUs do not reliably clear VRAM on deallocation. That is a research-demonstrated risk, not a documented in-the-wild marketplace exploit, but the exposure is structural.
For privilege analysis, the problem is simpler than any of that. Every one of these environments involves third-party disclosure at the moment a prompt leaves your machine. Under Heppner's reasoning, that is the waiver trigger, full stop.
The managed enterprise API
This is the path Rakoff's dicta gestures at. As of April 2026, the major vendors have converged on roughly the same shape of offering for commercial use. Anthropic's commercial API retains inputs and outputs for seven days and does not train on commercial data, with zero-data-retention available by addendum. OpenAI offers ZDR by enterprise contract on top of its Team/Enterprise tiers, which already exclude customer data from training. Google's Vertex AI documents a 24-hour default caching window with ZDR by request. All three permit human abuse review outside of ZDR. A firm that signs the enterprise addendum, layers a BAA or equivalent, and has counsel direct the AI use is doing something materially different from what Heppner did — but the enterprise-tier privilege theory is still untested. Rakoff left it open; he did not bless it.
The local model
A locally run model — on a GPU inside a machine in your office, on your network — involves no third-party disclosure at any stage. No prompt leaves the building. No output is cached on someone else's server. No retention policy applies because no retention is happening. The waiver question at the heart of Heppner never gets off the ground, because the disclosure element is missing.
That is the architectural posture Sound Suite was built for. It is also not a free lunch. Local models have real limitations I will be honest about further down.
The Risk Matrix, Drawn Honestly
| Risk dimension | Local GPU (Sound Suite default) | GPU marketplace (e.g. RunPod) | Managed API — default tier | Managed API — ZDR + enterprise contract |
|---|---|---|---|---|
| Data custody | You, only | Marketplace + host | Vendor | Vendor, no retention |
| Privilege risk under Heppner | Low — no third-party disclosure | High — third-party disclosure + host trust | High for consumer-like use | Moderate — untested but aligned with Rakoff's dicta |
| Training on your data | Never | Never under ToS | Not for API/commercial tiers by default | No |
| Provider logging and human review | None | Provider + host-level access possible | 7–30 day logs; human review possible | None |
| Subpoena exposure | Your premises only | Marketplace + host reachable under CLOUD Act and Rule 45 | Vendor reachable | Vendor reachable, less to produce |
| Reproducibility | Full, with pinned weights | Image reproduces; hardware does not | Model versions deprecate | Model versions deprecate |
| Capital cost | $3–10k workstation + power | $0.30–2/hr while running | Per-token | Per-token + enterprise minimums |
| Capability ceiling | 70B-class open-weight models | Any open weight you bring | Frontier | Frontier |
None of this is a one-dimensional ranking. For a solo employment lawyer dealing with a handful of sensitive matters, a local workstation running a 70B model is genuinely the strongest privilege posture available anywhere. For a firm doing high-stakes antitrust work that needs GPT-5-class reasoning, a negotiated enterprise contract with ZDR is almost certainly the only workable answer. The wrong answer, for almost everyone, is the consumer tier or an unvetted Community Cloud pod.
How Sound Suite Is Built So This Question Never Arises
I want to ground the architectural claims in specific code, because a privacy story that is not verifiable is just marketing. Sound Suite is source-available under the Polyform Noncommercial license — the source is on GitHub and you can read and modify it, but the license reserves commercial use, so strictly speaking it is not OSI-open-source. Every architectural statement below points at a file you can verify for yourself.
- Embeddings are local by default. The default vector embedding provider is a local copy of
Xenova/all-MiniLM-L6-v2running under@xenova/transformers, selected insrc/lib/db/config.tsand executed insrc/lib/ingestion/transformers-embedding-provider.ts. After the first-time model download from Hugging Face into~/.cache/transformers/, no further network calls are made to generate embeddings. - OCR is local. Scanned pages go through
tesseract.jsrunning in a forked child process (src/lib/ingestion/ocr-engine.ts). The engine does not phone home. - The reranker is local. The optional reranker runs a
Qwen/Qwen3-Reranker-8Bmodel inside a vLLM Docker container bound tolocalhost:8099(seescripts/start-reranker.sh). After the container image and model weights are cached, it runs without internet. - Storage stays in the install directory. SQLite metadata lives at
./data/sound-suite.db, vector data at./data/lancedb/, and extracted exhibits underpublic/exhibits/. No path escapes the project directory. - There is no telemetry. A grep for Sentry, PostHog, Amplitude, Mixpanel, Segment,
gtag, Plausible, and Umami across the source tree returns nothing. There is no update check. There is no analytics beacon. The app does not call any domain other than whichever inference endpoint you explicitly configure. - Cloud providers are opt-in, not default. The OpenAI and Claude embedding providers exist (
src/lib/ingestion/openai-embedding-provider.ts,src/lib/ingestion/claude-embedding-provider.ts), and Ollama is supported for both embeddings and optional completion-based summarization (src/lib/ingestion/document-summarizer.ts). Every one of these has to be selected in the admin settings and requires an API key. The default install does not send a single byte to any of them. - The MCP server has three auth modes. The Model Context Protocol server supports
none,apikey, andoauthmodes (src/lib/mcp/mcp-server.ts). The default isnone, which is only safe on a loopback bind. For any exposed deployment, set an API key or OAuth.
If you want to confirm this end-to-end, the agent who audited the codebase produced a full line-cited breakdown, and the most interesting finding was the one I was not expecting: Sound Suite is genuinely air-gap-viable after a single online session to fetch model weights. The pipeline, from PDF ingestion through OCR through chunking through embedding through vector search through MCP tool responses, completes with no outbound network traffic once those weights are cached. That is the architectural property Heppner was really about. Sound Suite did not add it after the ruling; the ruling simply clarified why it matters.
What This Does Not Fix
Being honest about the limits of local AI matters as much as being honest about the risks of cloud AI. Several things remain your problem on a local system:
- Physical custody. If someone steals the machine, they have your case files. Encrypt the disk. ABA Formal Opinion 483 is the relevant notification-obligation guidance.
- Local malware. An endpoint compromise is still an endpoint compromise. Your operating system, browser, and email client are the weak points.
- Open-weight model licenses. The Llama 3.3 license is not OSI-open — it restricts use above a large MAU threshold and prohibits using outputs to train other models (text; analysis). For most solo and small-firm use, it is fine. Mistral and DeepSeek offer more permissive licenses. The SaulLM series is an MIT-licensed Mixtral-base model fine-tuned specifically on legal corpora.
- Capability gap. Llama 3.3 70B at 4-bit quantization needs roughly 48 GB of VRAM, meaning a dual RTX 3090/4090 rig or an RTX 6000 Ada, and it does not match frontier reasoning models on the hardest tasks. If your matter genuinely requires Opus or GPT-5-class reasoning, a local 70B is not a direct substitute. You can still use a local model for the privilege-sensitive work — document search, pattern scanning, timeline extraction, citation analysis — and escalate only the depersonalized, non-client-identifying synthesis to an enterprise API tier where that trade is defensible.
A Practical Playbook If You Are in Litigation Right Now
- Stop using the consumer tier of any AI product for matters you care about. The Heppner holding does not require a perfect fact match to come back to bite you; the ToS-plus-training-plus-disclosure chain is the same across consumer products.
- If you must use a cloud model, use an enterprise contract with ZDR, directed by counsel, and keep a Kovel-style engagement letter in the file. That is the defensible cloud posture; it is also the one Rakoff's dicta gestures at, not endorses.
- For anything you would not put in an email subject line — custody files, financial forensic work, whistleblower paperwork — put it through a local tool. Sound Suite is one option. Ollama plus a 7B or 13B model is another. The point is not the product, it is the architectural property: no prompt leaves the building.
- Back up your disk, encrypt it, and keep your machine patched. Local does not mean careless.
- If you are pro se and reading this because the legal system has not given you a choice about representing yourself: this is exactly the gap Sound Suite was built to fill. You can clone the repository, point it at a folder of PDFs, and start asking questions within an hour on commodity hardware, with no account to sign up for and nothing leaving your machine. The documentation is here.
A Final Word on Trust
The deepest lesson from Heppner is not about artificial intelligence. It is about what trust looks like in a legal system. Privilege is a fragile doctrine. It survives by being honored carefully, and it breaks when the environment around it changes faster than practitioners realize. For thirty years, "send it via the firm's email" was a defensible answer for confidential work. For ten years, "put it in the firm's cloud drive with BAA-covered terms" was a defensible answer. For the last three, a surprising number of lawyers and pro se litigants have been typing work product into chat windows whose terms of service explicitly permit disclosure to governmental authorities. Heppner is the first federal opinion to say out loud what was structurally true the whole time.
Local AI is not a perfect answer. But it is the only architecture in which the Heppner question simply does not arise. If you build on top of it, you are not relying on a contract to protect your client's confidences; you are relying on physics. Sound Suite exists because, for the people who need this tool most, that is the only kind of protection they can actually count on.
The author is the builder of Sound Suite and is not an attorney. Nothing in this post is legal advice. Direct quotations attributed to the Heppner opinion and the Warner v. Gilbarco order in this post are drawn from secondary coverage; practitioners relying on either ruling should consult the slip opinions directly.