Family Chief of Staff

Share

Most AI agents demo well and live poorly. They handle a single task in a single moment, then forget everything. The promise of agentic AI is autonomy — a system that anticipates, manages, and improves on its own. The reality, so far, is a chatbot with a better marketing deck.

I decided to find out. Not another personal assistant that answers when prompted — a sophisticated orchestration system that thinks ahead, manages experts, and gets better on its own.

The mental load

Every family has an invisible job nobody applied for. School forms past due. A birthday twelve days away with no gift. A rental car unbooked with twenty days to go. An email sent three days ago with no reply. Your partner is traveling and both kids have games at the same time on Saturday — which one do you go to?

The mental load is relentless — not because any single item is hard, but because the items never stop arriving. They live in your head until they surface as a missed deadline or a late-night scramble.

I've been building a family operating system for a month. It has a daily digest, a Telegram bot, email processing. But all of those are reactive. None of them anticipate. None of them wake up at 5:30 AM and say: here are the three things that will bite you today.

So I promoted the system.

The job

A chief of staff doesn't answer questions. They make sure you never have to ask them.

The job is orchestration. Route the right information to the right person at the right time. Prioritize ruthlessly — not everything that's due matters, and not everything that matters is due. Synthesize across domains that don't naturally talk to each other. Know what the family actually cares about. Be discreet. Be invisible. When everything runs smoothly, nobody notices the system exists.

That's the design spec. An agent that thinks like a trusted advisor who happens to know your kids' schedules, your tax situation, your in-laws' birthdays, and which restaurant your wife liked in that city three years ago.

Three tiers of thinking

The system runs at three depths, because not everything deserves the same attention.

The pulse runs every hour. Peripheral vision — calendar events approaching, urgent emails, school notifications. It surfaces only what's time-sensitive.

The morning sweep runs before dawn. The analytical layer. It scans calendars, email, task lists, trip plans, reminders, and school deadlines to build a picture of what's open, approaching, and drifting. It drafts birthday messages, writes project plans, and sends a Telegram briefing. My morning coffee comes with a status report I didn't have to think about.

The weekly review runs Monday mornings with a more powerful reasoning model. Strategic thinking — project tracking, logistics balance between partners, schedule conflicts, research queues. Then it reviews its own performance. More on that below.

The expert network

A chief of staff doesn't do everything personally. They know who to call.

The system coordinates specialists — a wealth advisor, a tax advisor, legal counsel, a travel agent. Not separate chatbots sitting in a waiting room. Modes the agent shifts into when domain expertise matters, working autonomously within their lane.

A tax deadline approaches — the system researches deductions and drafts a summary. Spring break is twenty days out with hotels unbooked — it compares options and presents a recommendation. A lease renewal arrives — it surfaces key terms and deadlines before they become urgent.

Each expert does as much as it can on its own and stops where a human decision is needed. The output is near-complete work, not a suggestion to go do homework.

The guardrails are simple: the system can research, draft, plan, and notify. It cannot send a message on my behalf, RSVP to an invitation, or move money. Every external action stops at a draft — then waits for a human to say "do it."

Getting smarter

This is the part that makes it an experiment and not just automation.

The weekly review analyzes itself. Which alerts were useful? Which drafted messages got sent as-is and which needed editing? Which plans were acted on and which ignored? Each answer feeds back. A tone gap in a birthday message means the next draft pulls more personal context. An ignored alert means the threshold adjusts. Each correction sharpens the next cycle.

The system also learns the family. Every digest reply, every Telegram correction, every question that reveals a wrong assumption accumulates into a richer profile. What my wife cares about. Which reminders I act on. What "urgent" actually means to us. The context compounds.

This is reinforcement learning in the most literal sense — not gradient descent, but the practical kind where a system gets feedback, adjusts, and tries again tomorrow. The reasoning models are sophisticated enough to do this now. They can read their own output, compare it to what happened, and propose specific changes. Whether they're good at it is what I'm about to find out.

The model is the engine. The context is the fuel. Every day adds more fuel.

If it works, the system fades into the background. Birthdays don't sneak up. Deadlines don't drift. The mental load has somewhere to live that isn't someone's head. And nobody notices the machinery — which is exactly how you know it's working.