← Free tools · Tool 01

HDC Note Codebook

Turn 10,000 markdown files into a 6-codebook hyperdimensional index that runs on a laptop. Same library we use internally to make Claude remember our notes from last quarter. CPU-only, ~20 MB on disk for 10K notes, MIT license.

Why we built it. Off-the-shelf embeddings are slow and memory-hungry on big personal corpora. HDC (hyperdimensional computing) gives you sub-second cleanup retrieval on a CPU with 1–5 KB per note. We use the same library to drive every retainer client's private assistant.

Quick start · 90 seconds

Index 10K notes, ask a question.

# clone and install
$ git clone https://github.com/squatch-c-c/hdc-note-codebook
$ cd hdc-note-codebook && pip install -e .

# index a folder of markdown
$ codebook index ~/notes --out notes.codebook

[ok] scanned 9,438 files
[ok] built 6 codebooks (action / entity / topic / time / cite / mood)
[ok] wrote notes.codebook (19.2 MB)

# ask a question
$ codebook ask notes.codebook "what did I write about retainers in March?"
 retainer-pricing-v2.md  0.84
 march-leads.md          0.71
 tom-jerick-call.md      0.62

Works on Python 3.10+. Pure NumPy. Optional: integrates with Claude Code / Cursor / Continue.dev via the included MCP server.

What's in the box

Six codebooks, one CLI.

  • action-graph codebook — verbs and outcomes (the one with the 5/5 cleanup pilot score)
  • entity codebook — people, products, projects
  • topic codebook — conceptual buckets
  • time codebook — rolling temporal index
  • citation codebook — quotes, links, sources
  • mood codebook — sentiment / voice fingerprint

All six codebooks pack into one ~20 MB file per 10K notes. Loads in milliseconds. Update incrementally; no full re-index when you add a file.

When it pays off

Three signs you need this.

  1. Your notes folder has >2,000 files.
  2. You've tried embedding-based search and it eats RAM / fees.
  3. You want an AI assistant that reads your private notes without sending them anywhere.

Want it deployed for your team?

We integrate this for retainer clients in week one.

If you want the codebook running on your corpus, behind a private chat UI, with eval and ops dashboards — that's the on-prem AI co-pilot pilot ($24K, 14 days), or any retainer tier.

Just want updates?