Getting Started

Overview

Sound Suite is a local, self-hosted document intelligence platform designed for legal case management. It monitors directories on your machine for court PDFs, processes them through a hybrid OCR and vector pipeline, and exposes structured data via the Model Context Protocol (MCP) for AI-powered analysis.

What Sound Suite Does

  • Monitors directories for new PDF filings, briefs, and records
  • Extracts text from PDFs using direct extraction and OCR for scanned pages
  • Identifies exhibits (photos, diagrams, charts) embedded in documents
  • Generates vector embeddings for semantic search across all your case documents
  • Exposes 14 MCP tools that AI assistants (Claude, GPT, etc.) can use to analyze your cases
  • Provides a web dashboard at localhost:3000 for browsing cases, documents, and running searches

Privacy First

Everything runs on your machine. No documents are sent to external servers for processing. Embedding generation can use local models (Ollama, Transformers.js) or cloud providers (OpenAI, Anthropic) depending on your preference.


Quick Start

Get Sound Suite running in under 5 minutes:

# Clone the repository
git clone https://github.com/alperu/soundsuite.git
cd soundsuite

# Install dependencies
npm install

# Generate database client and build
npx prisma generate
npm run build

# Start all services
npm run svc:start

After starting, open your browser to:

Add Your First Case

  1. Open the dashboard and navigate to Case Management
  2. Click the + button to create a new case
  3. Enter a case name and select the directory containing your PDF files
  4. Sound Suite will automatically discover and begin processing documents

System Requirements

Minimum Requirements

Component Requirement
OS macOS 12+, Ubuntu 20.04+, Windows 10+
Node.js v18.0 or later
RAM 4 GB minimum
Disk 2 GB for application + space for your documents
Browser Chrome, Firefox, Safari, or Edge (latest)

Recommended

Component Recommendation
RAM 8 GB or more (for local embedding models)
CPU Apple Silicon or modern x86-64 with AVX2
GPU Optional — accelerates local embedding generation

Optional Dependencies

  • Ollama — for local embedding models (qwen3-embedding, nomic-embed-text)
  • Redis — for caching and improved search performance
  • 7-Zip — for automatic database backups

Core Concepts

Cases

A case represents a legal matter and is linked to a directory on your file system. All PDFs within that directory (and subdirectories) are treated as filings belonging to the case.

Documents and Filings

Each PDF file is a document. Documents are organized by filing type (briefs, clerk's records, reporter's records, etc.). Sound Suite uses SHA-256 hashing to prevent duplicate processing.

Document Processing Pipeline

Documents flow through a multi-stage pipeline:

  1. Discovery — File watcher detects new PDFs
  2. Text Extraction — pdfjs-dist extracts text from each page
  3. OCR — tesseract.js processes pages with low text density
  4. Exhibit Detection — Images are extracted and cataloged
  5. Chunking — Text is split into overlapping chunks for embedding
  6. Embedding — Vector embeddings are generated for each chunk
  7. Indexing — Chunks are stored in LanceDB for semantic search

Status Lifecycle

Documents progress through these states:

  • QUEUED — Discovered, waiting to be processed
  • PROCESSING — Currently being extracted and embedded
  • INDEXED — Successfully processed and searchable
  • PARTIAL — Some pages processed, others had errors
  • ERROR — Processing failed (retryable)

MCP (Model Context Protocol)

MCP is an open protocol that allows AI assistants to call tools exposed by external servers. Sound Suite's MCP server provides 14 specialized legal analysis tools that any MCP-compatible AI client can use.

Vector Search

Sound Suite uses LanceDB to store vector embeddings of document chunks. When you or an AI assistant queries your case knowledge, the system finds the most semantically relevant passages across all documents — even if they don't contain the exact keywords you searched for.