"Couldn't I just write this myself?"
Yes. You could write a generic doc filer in a weekend. Pdfplumber, watchdog, a regex or two, you're done. I know. That's how this product started.
What you can't write in a weekend is the recognizer set personalised to your paperwork, plus the runtime hardening that survives real beta installs. That's what an A$29/month subscription buys you.
A$29/mo · 14-day money-back guarantee · cancel any month
The personalisation is the product.
Generic filers get you 70% there. The remaining 30%. The recognizers tuned to your suppliers, your forms, your document layouts. Is what eats months when you DIY.
Things a "generic" filer can't do
- Tell your work-orders apart from your delivery dockets. Both have barcodes. Both have a date. Both have an account number. The thing that distinguishes them is a stable header phrase, a barcode prefix, or a layout anchor specific to your suppliers. Encoding that for one company is an afternoon. Encoding it for your company. Without seeing your paperwork. Is impossible.
- Survive Acme Pty Ltd renumbering their invoice template. Your weekend script breaks. My setup pipeline catches it because I read each new sample by hand and update the recognizer; you'll catch it three weeks later when your filing has been quietly wrong.
- Reject Code 128 barcodes from a random delivery docket as "not yours." A naive parser will silently file someone else's manifest into your folder. Recognizers built off your real samples carry capture-group constraints that throw the wrong-format match out.
- Carve a 200-page mixed PDF dump into individually-named documents. The hard part isn't reading the pages. It's knowing where document A ends and document B begins. That boundary lives in the recognizer set, not in a generic library.
- Park the unknown without crashing. Anything that doesn't match your recognizer set lands in
UNKNOWN/with a one-line note. You email me a sample, I update the kit, you reload. No "the script silently misfiled it under the wrong category for a month."
The personalisation step takes me 60-90 minutes per customer because I've built the pipeline. It would take you days, because you don't have it, and you have to learn each pattern category as you hit it. /why-no-ai covers what runs on your machine and why no LLM ever touches your scans.
The edge cases will eat your weekend.
Real edge cases from real beta installs. None of these were in the v0.1 spec. All of them have shipped fixes now.
A non-exhaustive sampler
- Document boundaries inside one PDF. A 200-page scan from an MFC is rarely 200 pages of one document. The reader has to detect "new document starts here" reliably. Off layout cues that vary per supplier. Generic libraries don't do this; recognizers personalised to your docs do.
- Duplex scans where one side is blank. The blank side has no recognizer hit. The reader has to detect it, attribute it to the previous page's document, and not start a new file. (We learned this from a clinic that scans every patient form duplex.)
- Stapled documents. Page 3 has a hole punched through the recognizer's anchor area. Now what? Answer: fallback recognizers recover it 80% of the time; the other 20% the page lands in
/unknown/with a one-line note. - Photocopied pages that are slightly grey. Toner-poor MFCs render at 30% black instead of 100%. Reader has to threshold adaptively. Two days of tuning.
- Watch-folder race conditions. User drops a 200 MB PDF; the OS reports it modified before the write finishes. Your reader opens a half-flushed file, reads zero pages, tries to delete the source, panics. Solution: a 2-second size-stable check before processing. Looks trivial in retrospect; ate four hours.
- Cross-platform path separators. Customer on Windows mapped a network drive that contains a Mac's hidden
.DS_Storefiles.Watchdogemits events for them. Filter, don't crash, don't try to scan a 0-byte hidden file as a TIFF. - Locked PDFs from older Acrobat versions. The library throws a cryptic
PdfReadError. You need to catch it specifically, log it, and route the file to/locked/with a human-readable note that says exactly which file and what to do. - Licence key rotation while the app is running. User's monthly subscription auto-renews; the new key arrives by email; the running app still trusts the old key. Hot-reload of the licence without restart is a 30-line fix that took a 3-hour incident to find.
- Apple's quarantine attribute on first launch. Sequoia silently deletes unsigned
.appbundles after 60 seconds. Install instructions needxattr -dr com.apple.quarantinebefore the user double-clicks. Lost three trial customers learning this. - HEIC scans from iPhone copies. Some users photograph the document with an iPhone instead of scanning. HEIC support requires a specific
pillow-heifbackend on Linux. Ship the wheel or fall back gracefully.
Rough rule from beta: every five customers surface one edge case the previous four didn't. The list above is the first 10 of about 40 fixes that aren't in the original codebase.
Your time costs more than the subscription.
If you're an engineer reading this, your hourly rate is the comparison. Not zero, not $50. Your real rate.
Conservative back-of-the-envelope
Pretend you're an experienced backend dev at $120/hr ($150/hr loaded with super, leave, equipment). What does the DIY version actually cost?
Recognizers tuned to your docs: ~14 hr (and ongoing)
Watcher + queue + retry: ~6 hr
Cross-platform install: ~10 hr (this is the trap)
Folder library + UX: ~8 hr
Licence + entitlement check: ~4 hr
Edge cases (40 × 30 min avg): ~20 hr (spread over the first year)
Documentation + onboarding: ~6 hr
..................................
Year-1 build: ~78 hr × $150/hr = $11,700
The Matrix single tier:
A$29/mo × 12 = ~A$348/yr
Break-even: you save the build before month 1. And if it doesn't fit your work, the 14-day money-back guarantee gets your money back.
And that's against an experienced dev who already has a clean Saturday. If you have to learn the document-recognition space from scratch, double the build estimate. If you have a job, you don't have those Saturdays. You have nights, and your night-time hourly rate to your family is much higher than $150.
The honest version: the ten engineers who have asked us this question on forums have, between them, shipped zero production filing tools.
Build it if it's interesting. Buy it if you need it shipped.
If filing is your hobby project, please have at it. The recognizer thinking on the why-no-ai page is yours, the edge case list above is yours, and we'd love to read your write-up.
If filing is the thing standing between you and your weekend, A$29/month is less than a coffee a day. Email your sample docs, run it on a real day's paper, and if it doesn't fit, the 14-day money-back guarantee has you covered.
A$29/mo · 14-day money-back guarantee · cancel any month