A deliberate design choice

No AI in your data path. That's the point.

Once your install kit ships, what runs on your machine is deterministic recognizer matching: pages either fit a pattern I confirmed against your samples, or they're parked in UNKNOWN. Loud failures. No silent misclassification. No model calls. No tokens. No usage dashboard.

Email your sample docs Read the security audit

A$29/mo · 14-day money-back guarantee · Windows · macOS · Linux

The three failure modes

Three things AI-at-runtime doc-filers do wrong.

Not a hit-piece on any one product. The same three failure modes show up across the category whenever the LLM is doing the live filing decision. And any auditor on a serious matter will hit at least one of them.

Classifies wrong, and you don't know why

An LLM doc-filer reads a page, decides it's an "invoice" with 87% confidence, and drops it in the invoices folder. When it's wrong. And on a long enough timeline it's wrong. There's no audit trail explaining the decision. You only find out when you go looking for a signed contract and it's filed under "delivery dockets."

The Matrix can't make that mistake. Each recognizer fires on a deterministic pattern I built from your real samples; pages that don't match a recognizer park in UNKNOWN for a human to look at, never silently routed somewhere wrong.

Trains on your documents

Most AI doc-filer terms of service grant the vendor a licence to use your documents to "improve the service." For a freelancer that means nothing. For anyone handling privileged matter, health information, or sensitive client data, that one clause is the deal-breaker.

The Matrix has no runtime document ingest. The samples you email once at setup are deleted after 30 days; ChunkLand never sees the scans your installed Matrix processes day-to-day.

Needs internet

If your hotel Wi-Fi is broken, your home internet is down, your laptop's on a flight, or your client engagement letter says "no cloud," an LLM-at-runtime doc-filer is dead in the water. The model lives on the vendor's servers. Inference happens off-site. No internet, no inference.

The Matrix runs on your machine. Sorting works air-gapped. First-launch licence check then a 7-day rolling offline grace. The recognizers are local files; nothing phones home to file a page.

How we do it instead

Build path: AI configures. Runtime path: deterministic recognizers.

If you can read the source, you can verify the claim.

The build path (one-off, my side)

Teaching an AI to reliably classify the documents you actually file runs into thousands of dollars in compute and weeks of iteration. I already did that work. On a local AI running on my own machine, not in someone else's cloud, not in a service that trains on your data. Your samples never get uploaded to a public AI.

You email me as many of your everyday documents as you have to setup@chunkland.com (the more the better for accuracy). I open each one and read it by hand, looking for the stable patterns (sender block, account number formats, "Work Order #" prefixes, supplier barcodes, date placements). Each pattern becomes a recognizer entry in your provision.json. Things like:

regex_text. Tight regex against the OCR'd top-of-page block, e.g. ^WO\s+(\d{8})\s to capture a work-order number from a printed header.
barcode_capture. Code 128 / Pegasus capture-groups for documents that already carry barcodes (we read what's on your paper, we don't add anything to it).
layout_anchor. Positional anchors for documents whose key data sits in the same place every time (statement headers, invoice blocks).

I check each recognizer against your samples until matches are clean and false positives are zero. Then I bundle the kit and email it to you.

scripts/intake/configure_kit.py

The runtime path (every day, your side)

Your installed Matrix loads provision.json on startup and watches the folder you pointed it at. When a scan lands:

Each page is OCR'd locally. We use OCR to read text, never to classify document type.
Each recognizer is tried in order. Match = filing decision. No match across the whole set = page parks in UNKNOWN.
Same input → same output. No model, no probability, no confidence score on the page itself.

If a page lands in UNKNOWN. A new supplier, a redesigned form, a document type you didn't email at setup. You reply to the original setup email with a sample, and I send back an updated provision.json. The recognizer set grows by hand, in a way you can read.

RecognizerMiss("no recognizer matched. parked in UNKNOWN/")

src/runtime/recognizers.py. Match() / route()

The honest part

Where an LLM-at-runtime would actually be useful. And we still won't add it.

There are two places an LLM would genuinely help in the runtime path. We've thought about both. Here's why neither one ships.

1. Auto-classifying brand-new document types

You signed up a new supplier this week. Their first invoice doesn't match any recognizer in your kit. An LLM at runtime could read it, infer "this is an invoice from Acme," and propose a destination. Useful.

We're not adding it. The moment an LLM is in the runtime path, our marketing has to start saying "the vendor's model reads your documents," and the security claim that holds the rest of the product up. Your scans never traverse the internet, never reach an LLM at runtime, never appear on someone else's server. Gets a footnote. Audit-grade products don't have footnotes.

The boring answer wins: when something parks in UNKNOWN, you reply to your setup email with the sample, and I add a recognizer by hand. Two minutes my side, deterministic forever after.

2. Summarisation across your archive

"Show me every signed agreement from this contract." "Find me the 2023 invoices for that one supplier." Summarising or querying across thousands of filed pages is a real workflow and an LLM could nail it.

We're not adding it. Same reason. Summarisation requires reading the document text at scale. That requires either uploading to a cloud LLM (breaks the no-cloud claim) or shipping a local model (massively expands the binary, requires GPU, opens model-update surface). Neither is a trade we'll make against the audit-grade promise.

If you need cross-document search on your local archive, the answer is the operating system you already have. Spotlight on macOS, Windows Search on Windows, recoll or fzf on Linux. They search filenames and PDF text without phoning home. The Matrix files the documents into the right folders so those tools have something predictable to search across.

Why this matters to a buyer

"No AI in your data path" is a product feature.

Half the doc-management category right now is racing to add "AI-powered" to the headline of the runtime product. We're going the other way, on purpose. When anyone. A privacy reviewer, an auditor, an opposing solicitor, you. Asks "does the vendor's tool ever see the document content at runtime?". The answer is no, and the answer never changes between releases. We can't change it without rewriting the architecture and we have no plans to rewrite the architecture.

The setup samples are a bounded, one-time exception. You knowingly send a handful of documents, kept for 30 days then deleted, used only to configure your kit. After that the runtime is offline-clean.

That's the trust signal. Not a marketing claim. A choice that constrains every future release of the product.

Email your sample docs.

3-5 of the documents that fill your scanner each week. I'll configure the recognizers around them and ship the kit back inside 48 hours. A$29/month, 14-day money-back guarantee. Runs on your computer.

Email your sample docs Read the security audit