Who Owns Your AI Memory? Because It Probably Isn’t You.
I spent three years feeding ChatGPT context. Then I realized the memory it built wasn’t mine. So I took it back - with a vector DB on a Mac Mini.
1. Introduction - The version of me inside ChatGPT does not exist anymore
Okay so. I was using ChatGPT the other day and it gave me advice that would have been great - if I were still the person I was in 2023.
Same thing happened the week before. And the month before. I have been using ChatGPT since late 2022, and somewhere around the two-year mark I started feeling it: the model was responding to a version of me that no longer exists. Old priorities. Old preferences. Projects I had already killed. Opinions I had reversed.
I tried to fix it inside ChatGPT. I could not. I could not inspect what it remembered. I could not reliably overwrite it. The memory layer was there, somewhere, but it was the vendor’s copy of me, not mine.
Try this for yourself. Ask ChatGPT: “what do you actually remember about me?” You will get a tidy summary. Your name, your job, that you like short emails. That is the shallow layer. The “nice to meet you” layer. The real memory lives underneath - the decisions, the reasoning, the way you have changed your mind over two years of conversations. And none of that belongs to you. It lives inside somebody else’s database, behind a chat interface that hands you the summary and hides everything else.
So I did something a year ago would have sounded crazy: I built my own memory system. Local vector database on a Mac Mini, plugged into Claude, ChatGPT, Claude Desktop, and every MCP-aware tool I use. One brain, many clients. I control what goes in. I control what stays.
This article is not really about the code. It is about the question underneath it: who owns the context of your life with AI? If the answer is “a company whose retention policy I have not read,” that is worth sitting with.
2. Perspective - The lock-in nobody talks about
Here is the thing most people miss.
At the macro level, the top AI models have converged. Blindfold-test Claude, ChatGPT, and Gemini on a hundred real tasks and most people could not tell which is which. The differences are in the micro - tone, edge cases, specific reasoning quirks. Real differences, yes, but not the reason you keep coming back to one tool every day.
So what actually locks you in? Not the model. The model is interchangeable. What keeps you in place is the memory the tool has accumulated about you. That is the moat. And every major AI vendor is quietly building it higher while everyone argues about benchmark charts.
People usually only notice this when they try to switch. They open a new tool, realize it does not know them, feel the friction of re-briefing, and go back. The memory did not just help them - it trapped them. And they call it “preference” because it feels like their choice.
The part I want you to think about: the AI market is not stable. It is not browsers where Chrome won. It is not search where Google won. We are going to be switching tools. We are going to be running three, four, five AI products in parallel for years - one for coding, one for writing, one for research, some that do not exist yet. If your memory lives inside one of them, every switch costs you context. Every new tool starts from zero.
Back to ChatGPT. The reason it kept quoting 2023-me at me was not that it was dumb. It was that its memory was append-only from my side. I could add. I could not really curate. When old facts and new facts conflicted, it often went with the older one because that version had more repetitions backing it up. I was the product manager. I had no admin panel.
So I stopped thinking of memory as a feature of the tool. I started thinking of it as an asset of mine that I had stupidly let the tool hold for me.
I call the alternative BYOM - Bring Your Own Memory.
The analogy is BYOD (bring your own device) and BYOK (bring your own key). In BYOM the vendor brings the model. You bring the memory. The two meet at an open protocol - in my case MCP (Model Context Protocol). The vendor does what vendors are good at: train huge models, run them cheap, ship them fast. You do what only you can do: own the truth about yourself.
Once memory lives behind a protocol instead of inside a product, everything changes. It becomes portable, inspectable, backup-able, forkable. You can delete entries. You can hand a copy to a new tool and it knows you before you have typed a word. The shift - from “feature inside a product” to “service I own” - is the whole point of this piece.
3. Gamify - How to build this in a weekend
Now the practical part. I am going to show you how the whole thing fits together and give you the actual commands. If you are a developer comfortable with Node and a terminal, you can be running by Sunday evening. About 1800 lines of JavaScript total. Five dependencies. No cloud services.
The architecture in one picture
Claude Code / Claude Desktop / your custom agents
│ (MCP over STDIO)
▼
mcp.js ← thin STDIO bridge, spawned/killed freely
│ (HTTP on 127.0.0.1:3110)
▼
server.js ← persistent Fastify daemon, owns the DB
│
┌────────────┼──────────────┐
▼ ▼ ▼
db.js embed.js judge.js
SQLite + Ollama (optional)
sqlite-vec nomic-embed auto-tag
+ FTS5 + rerank
Two processes, not one. This is the single most important structural decision so I will explain why.
Claude Code and Claude Desktop spawn MCP servers as child processes over STDIO, and they kill those processes on every restart. If the SQLite database lived inside the STDIO process, every Claude restart would mean re-opening the DB, losing the WAL journal, risking race conditions with other clients. Bad.
So: a thin STDIO wrapper (mcp.js, ~40 lines) that Claude freely spawns and kills. A persistent HTTP daemon (server.js) that actually owns the database, stays up, and speaks plain HTTP on port 3110. The wrapper proxies calls to the daemon. Both share the same tool schemas from mcp-tools.js.
Bonus: because the daemon speaks HTTP, anything else can hit it too - custom agents, scripts, a web dashboard, future tools. One memory, many clients.
The stack - five dependencies and nothing else
Component Technology Why Language JavaScript (CommonJS) No TypeScript, no build step. node server.js and done. HTTP Fastify 5 Fast, minimal, handles JSON well. Database better-sqlite3 + sqlite-vec + FTS5 One file, zero ops, ACID, vectors and text in one transaction. Embeddings nomic-embed-text via Ollama 768-dim, runs locally on M-series Mac, zero API keys. MCP SDK @modelcontextprotocol/sdk Official SDK, both STDIO and HTTP transports. Validation Zod 4 Schema per MCP tool.
package.json dependencies: 5. That is the whole list.
The schema - the fields that actually matter
One SQLite file, three real tables plus two virtual. The core is entries:
CREATE TABLE entries (
id TEXT PRIMARY KEY,
parent_id TEXT, -- links a chunk to its parent entry
kind TEXT DEFAULT 'document', -- document | chunk | fact | preference | event
type TEXT NOT NULL, -- feedback | user | project | reference
title TEXT NOT NULL,
content TEXT NOT NULL,
content_hash TEXT NOT NULL, -- SHA-256, for exact dedup
tags TEXT, -- JSON array
source_tool TEXT NOT NULL, -- claude-code | claude-desktop | ...
source_reason TEXT NOT NULL, -- WHY this was written
source_session TEXT, -- which session produced it
confidence REAL DEFAULT 1.0, -- 1.0 = human, 0.6 = auto, 0.3 = inferred
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
verified_at INTEGER, -- last confirmed still true
expires_at INTEGER, -- when to re-check
supersedes TEXT, -- ID of entry this replaces
superseded_by TEXT, -- ID of entry that replaced this
archived_at INTEGER, -- soft delete
access_count INTEGER DEFAULT 0 -- popularity signal for ranking
);
Simplified view. The full schema has ~22 columns plus a few indexes and an
extraJSON field for arbitrary metadata. Above is what matters for understanding.
Two virtual tables sit on top: entries_fts (FTS5 for keyword search, with porter unicode61 tokenizer so it handles Cyrillic), and entries_vec (sqlite-vec, FLOAT[768] for embeddings).
Every field above earns its place. parent_id + kind are how chunking works: long content splits into chunks, each chunk is a row with kind: 'chunk' pointing at its parent, and embeddings live on the chunks so search can find the specific part instead of a diluted average of a long document. confidence lets the AI distinguish “user explicitly said this” from “I inferred this.” verified_at lets a maintenance agent mark old facts as still true. supersedes turns updates into an audit trail instead of destructive edits. access_count is a popularity signal the ranker uses.
And the part that sold me on SQLite in the first place: backup is one command. cp memory.db backup.db and you have a complete snapshot of your entire AI memory - text, vectors, audit log, everything, in a single 3.4 MB file. Try doing that with Pinecone. Try exporting your memory out of ChatGPT. This is what “you own it” looks like in practice - three-second backup, zero vendor involvement, a file you can git-commit if you want version history. SQLite also runs in WAL mode, which is why Claude Code, Claude Desktop, and my background maintenance agent can all read the database concurrently without stepping on each other.
Embeddings: Ollama locally, and the one detail nobody mentions
Install Ollama, pull the model, and embeddings run on your laptop forever free:
brew install ollama
ollama pull nomic-embed-text
The flow: text -> POST http://127.0.0.1:11434/api/embeddings -> Float32Array(768) -> L2 normalize -> write to SQLite.
The L2 normalization step is the detail that costs two hours of debugging if you skip it. sqlite-vec computes Euclidean distance, but if your vectors are unit-length, Euclidean distance equals cosine similarity (cos_sim = 1 - L2² / 2). Without normalization your results feel vaguely wrong. With it they feel right.
function l2Normalize(vec) {
let sumSq = 0;
for (let i = 0; i < vec.length; i++) sumSq += vec[i] * vec[i];
const norm = Math.sqrt(sumSq) || 1;
const out = new Float32Array(vec.length);
for (let i = 0; i < vec.length; i++) out[i] = vec[i] / norm;
return out;
}
Real numbers from my running system: 441 ms per embed, 3 ms for FTS5 search, 3 ms for vector search. The bottleneck is embedding, not search.
Search: hybrid, filterable, fast
Three modes: keyword only, semantic only, hybrid (default). Hybrid runs FTS5 and vector search in parallel, then fuses the results with Reciprocal Rank Fusion:
const RRF_K = 60;
ftsHits.forEach((h, rank) => {
scores.set(h.id, (scores.get(h.id) || 0) + 1 / (RRF_K + rank));
});
vecHits.forEach((h, rank) => {
scores.set(h.id, (scores.get(h.id) || 0) + 1 / (RRF_K + rank));
});
Entries that show up in both lists get a higher final score. Then I layer post-fusion boosts - confidence multiplier, access-count boost, recency bump, expiry decay. A handful of lines each.
But the part that makes this actually beat vendor memory is the filters. Every search call accepts type, tags, source_tool, confidence, date ranges. My top tags right now: personal-brand (54), active (52), substack (40), profile (34). Each one a slice I can narrow into with a single filter.
The concrete example that made this click for me: working on Temporal.day, I do not need the AI to recite my age and job title. I need it to surface which pricing models I tried, which I killed, and why. type: project, tags: ["temporal-day", "decision"] returns the list in milliseconds. Built-in memory cannot do that. The fidelity lives in the tags.
Write discipline: the rule that keeps the memory clean
Every write hits a validator before it touches the DB. Required fields (type, title, content, source_tool, reason). Min content length 20 chars. Real-character check to block junk. Then two-stage dedup:
Exact: SHA-256 hash of content. If match, return 409.
Fuzzy: embed the new entry, run
vectorSearch(embedding, 1), and if cosine similarity > 0.85, return asimilar_existswarning with the existing ID.
The AI client can either force the write with confirm_duplicate: true or, better, call memory_update on the existing record. This single rule cut my duplicate rate from “everywhere” to “basically never.”
Plus rate limiting (60 writes/minute per tool) and an append-only write_log that records every mutation with a 100-char snippet. You always know who wrote what and why.
The eight MCP tools the AI actually uses
The whole interface is eight tools, registered via @modelcontextprotocol/sdk:
Tool What it does memory_describe Self-description: current vocabulary, rules, config. Called first in every session. memory_search Hybrid search with filters (type, tags, source_tool, confidence, dates). memory_get Fetch one entry by ID. memory_list Recent entries, newest first. memory_write Create entry. Runs dedup, chunking, optional auto-tag. memory_update Update entry. Re-embeds if content changed. memory_verify Confirm entry is still accurate (stamps verified_at = now). memory_delete Soft-delete (sets archived_at). memory_supersede Replace old entry with new, linked via supersedes chain.
Self-describing: how the AI learns to use it without you micromanaging
memory_describe returns the live vocabulary - which type values already exist and how often, which tags are popular, what the validation thresholds are, which filters are available. Right now mine returns: feedback (31), reference (29), project (26), user (25) as the top types.
Why it matters: when Claude sees feedback has 31 entries, it uses feedback instead of inventing feedback-notes or user-feedback-misc. The vocabulary converges organically. No hardcoded taxonomy. No drift across tools.
The fifteen-line prompt that wires it into Claude
The server is half. The other half is ~/.claude/CLAUDE.md, which Claude Code reads automatically:
## Memory
Primary memory is `personal-memory` MCP (HTTP :3110).
Always use memory_* MCP tools:
- Read: memory_describe (first call per session), then memory_search
- Write: memory_write with source_tool: "claude-code"
- Update stale facts: memory_update + memory_verify
- Supersede: memory_supersede when an old entry is replaced
### What to save
- Stable facts about user, projects, preferences
- Explicit feedback ("don't do X, prefer Y")
- Reference paths, endpoints, decisions and reasoning
### What NOT to save
- Anything in current code/git/files
- Ephemeral task state
- Conversation summaries
Without this file, the AI does not know the memory exists. With it, the behavior is automatic on every session. The same prompt works for Claude Desktop, custom agents, anything MCP-aware.
The actual setup - copy-paste this
# 1. Install Ollama and the embedding model
brew install ollama
ollama pull nomic-embed-text
# 2. Create the project
mkdir -p ~/.personal-memory/server && cd ~/.personal-memory/server
npm init -y
npm install better-sqlite3 sqlite-vec fastify @modelcontextprotocol/sdk zod
# 3. Create .env
cat > ../.env << 'EOF'
OLLAMA_URL=http://127.0.0.1:11434
EMBED_MODEL=nomic-embed-text
PORT=3110
HOST=127.0.0.1
EOF
# 4. Write the code:
# server.js, mcp.js, mcp-tools.js, db.js, embed.js, search.js, rules.js
# (architecture above tells you what each does)
# 5. Start the daemon
node server.js
# 6. Sanity check
curl http://127.0.0.1:3110/health
Then register it in Claude Code (~/.claude/settings.json) and Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"personal-memory": {
"command": "/path/to/node",
"args": ["/Users/you/.personal-memory/server/mcp.js"]
}
}
}
Drop the CLAUDE.md snippet from above into ~/.claude/CLAUDE.md. Restart Claude. You are running.
What it looks like when it works
A real memory_search response from my system, query = “writing style”, hybrid mode, limit = 2:
{
"results": [
{
"id": "feedback_mnxikwm3_588da6e4b3c1",
"type": "feedback",
"title": "Use short dashes, not em dashes",
"tags": ["writing", "style", "substack"],
"confidence": 1, "access_count": 84, "score": 0.0391
},
{
"id": "feedback_mntv57h4_c0281c034edb",
"type": "feedback",
"title": "Keep Substack Notes short (100-150 words)",
"tags": ["writing", "substack", "length"],
"confidence": 1, "access_count": 62, "score": 0.0358
}
],
"timing": { "fts_ms": 3, "embed_ms": 441, "vec_ms": 3 }
}
Two entries. Both tagged writing and substack. Together they have been retrieved 146 times - which means 146 times across months of sessions my AI tools already knew how I want to write, and I did not type a single reminder.
One more thing: let an AI maintain the memory for you
I do not curate entries by hand. A separate workflow - an agent I call OpenClaw - reads the memory periodically, finds stale entries, and either verifies them, updates them, or supersedes them. It uses the same MCP tools every other client does. It writes back with source_tool: "openclaw-agent". Every maintenance action ends up in the audit log.
This is the piece that turns a database into something alive. Facts do not need humans to stay fresh. They need another process with the same API access as the humans.
Closing
The technical part is the easy part. Hundred entries, 3.4 MB, 3 ms searches, five dependencies, zero API keys, one command to back up. A weekend of work and you are running.
The harder part is the question I want you to actually sit with.
Who controls how you show up to AI?
For most people right now the answer is: the vendor whose product they use most. And people push back on this with “well, I can always ask it what it knows about me.” So try it. You will get the shallow layer - your name, your job, a couple of preferences. What you will miss is everything that actually matters. The project decisions. The half-formed opinions. The pattern of how you change your mind. The texture.
Memory lives in the details, and details only surface when you can slice them - by tag, by project, by time, by confidence. A chat interface flattens. A vector database with filters does not. And no built-in memory is going to close that gap for you, because closing it would mean handing you the tools to leave.
If you are fine with where your context lives right now, fine. That is a real choice.
But I do not think most people have actually made the choice. They defaulted into it, because the tool is convenient and the cost of the default is invisible until the day they want to switch.
We are early. I think each of us is eventually going to carry a digital twin - a persistent, portable memory of who we are that any AI can plug into and understand us immediately. That is where this is going. We are just not taking memory seriously enough yet to see it.
Worth fixing. Bring your own memory.
If you want the full source layout, the schema, or the exact setup commands, reply to this post and I will send the detailed build guide.

