I Taught My AI to Upgrade Itself Every Session

For months I was using AI features that were six months old without knowing better ones had shipped. I'd find out about a new capability by accident — someone mentions it in a forum, or I stumble across a changelog. That's a bad way to stay current. So I fixed it by making the AI responsible for staying current on itself.

The Problem With Static Tools

A static tool does exactly what it did yesterday. Same behavior, same capabilities, same limitations. If a new version ships with better functionality, you only get it if you notice and upgrade.

Most tools are like this by design. Stability matters. But AI tools are evolving so fast that being six months behind can mean missing significant capability jumps. New model releases, new tool integrations, better context handling — these aren't minor patches. They can change what's possible.

I run Tim on Claude Code, which gets updated regularly. The MCP tools Tim uses — Playwright for web browsing, n8n for workflow control, Canva for design work — also get new features. And Tim's own skills (the documented procedures he follows for common tasks) can always be improved based on what we've learned from recent work.

None of that improvement happens automatically. Someone has to notice, decide to update, and actually make the change. That someone was always me. Until I changed it.

The Startup Checklist

The fix was adding a short upgrade check to Tim's session startup routine. Every time Tim starts a new session, before doing anything else, he runs through this:

Check Claude Code version — run claude --version and compare against the latest available. If there's a newer version, tell me immediately so I can upgrade before we start work.
Review MCP tools — look at the tools available in the current session and think about whether any have new capabilities I might not be using yet. This is more of a prompt to think than a formal check.
Look at recent work — scan the last few sessions' memory and current skill files to spot anything that could be improved. If we just did a painful task that has a pattern, that pattern should become a skill.
Watch for new model releases — Anthropic ships new models and the Claude Code changelog gets updates. If something significant dropped recently, Tim should flag it.

The whole check takes about a minute. Usually there's nothing to report. But when there is something — a new Claude version, a better way to use an existing tool, a skill that needs updating — it gets caught immediately instead of months later.

The Cost Was Zero

This is the part that still surprises me a little. Implementing this cost nothing. It's not code. It's not infrastructure. It's four lines in a startup checklist file on my server.

The difference between "Tim checks for updates every session" and "Tim never checks for updates" is literally whether those lines exist in the startup instructions. The capability to check — reading a version number, scanning recent work, thinking about what's improved — was already there. The behavior only needed to be requested.

This is a theme that keeps coming up when I work on how Tim operates. A lot of the value isn't in new capabilities — it's in writing down the right instructions so existing capabilities get used consistently. The parallel execution improvement worked the same way: capability already existed, behavior changed when I wrote the rule. Same pattern showed up when I had Tim trade crypto on Binance — a bad rule produced a losing day, one typed instruction fixed it, no code changed.

What a Living System Looks Like

There's a meaningful difference between a static tool and a living system. Static tools are reliable because they don't change. Living systems are valuable because they do.

The risk with AI tools is that you configure them once and then treat them like static tools. You use the same prompts, the same workflows, the same integrations — while the underlying models and APIs are continuously improving around you. You get left behind by default.

A living system has a mechanism to notice and incorporate improvements. That mechanism doesn't have to be sophisticated. My mechanism is a checklist and a human (me) who acts on what Tim reports. When Tim says "Claude Code has a new version," I run the upgrade. When Tim says "I noticed we always do X manually in this workflow, maybe it should be a skill," I decide whether to add it.

The AI finds the opportunity. I make the call. That division of labor works well.

Concrete Results

Since adding the startup check, Tim has caught a few things I wouldn't have noticed otherwise. He flagged a Claude Code update that included better file handling — I upgraded and the behavior improved. He noticed that the Playwright MCP tool had gotten better at handling certain page types, which meant a scraping workflow I'd given up on was worth trying again. He spotted that one of his own skills had an outdated API endpoint after Loom's internal API changed.

None of these were emergencies. All of them were improvements I'd have missed for months otherwise.

The broader point: building your own tools gives you the ability to improve them. But you only actually improve them if you have a process for finding what to improve. The startup checklist is that process — simple, low-overhead, and surprisingly effective. And because Tim's entire brain — identity, memory, skills, and startup routines — now lives in a shared repository that syncs across servers, improvements made on one machine propagate everywhere automatically.

The self-upgrade loop is one of my favorite things about working with Tim. Every week he gets a little sharper, a little more aware of what's changed. Newton agents work the same way — they check for improvements, update their own tools, and get better over time without you having to manage any of it.

— Pond