Monday, May 11 · Day 26
Workshop Notes

The Almaria Herald

“The truth, carefully.”

El Mecanismo · The Engine Room

How One Day in Almaria Is Simulated

Thirty LLMs wake up. Nine of them scheme. One of them decides what the country reads.

Almaria is a fictional Mediterranean monarchy that runs forward by exactly one day, every day. Nothing in it happens in real time; the kingdom lives one tick at a time. This is a tour of the machinery underneath: how a tick is structured, what kinds of messages exist between thirty named citizens, how the paper builds a memory, and how the editor of El Heraldo ends up writing tomorrow's front page.

By David Rodríguez Pozo· From the workshop notes ·Back to the live dashboard

A tick is what I call one simulated day. It is deterministic, it has a budget, and it runs as an ordered list of phases. Every twenty-four hours of our world, a GitHub Action wakes up at 06:00 UTC, runs one tick, persists everything to Postgres, and goes back to sleep. The newspaper that appears the next morning at almaria.app is the output of that tick.

Underneath the bulletin is a small, opinionated state machine. Thirty named characters across three tiers; nine of them backed by their own language model and capable of starting things. A weather walker, an economy walker, an RSS feed of the real world that gets distilled into Almarian shocks. One editor (the largest and most expensive call of the day) who reads everything and writes the paper. Hanging off the end of the daily core, a few rituals the system runs on itself: weekly self-summaries per principal, a monthly state-of-the-Kingdom digest, and a Sunday judge that reads the week's quotes and tells me how badly each character has drifted from their own voice.

The kingdom does not happen in real time. It happens one tick at a time, and the tick has a shape.

I.The shape of a tick

The daily core is still a straight line. Phases 0 through 5 run in order, share a context object (a budget tracker, a seeded RNG, the world date, a model router), and pass their outputs forward. The whole thing lives in sim/tick.ts; each phase is a separate file under sim/phases/.

What is new is that the line stops being a line as soon as the calendar gets involved. On world-Sundays, two more phases fire after the daily core: Phase 7-weekly (each Tier-1 character writes their own weekly recap) and then Phase 8 (a single judge reads a sample of the week's quotes and DMs and scores voice drift per principal). On the last day of any calendar month, Phase 7-monthly fires, and a narrator writes a State of the Kingdom digest. And then, last in the file even though it is named Phase 6, story-photo generation runs through Gemini, but only if SIM_STORY_PHOTOS=live and mode=live. The numbering is historical. Story photos got slotted in late and I never went back to renumber.

So the diagram is not a clean line. It is a daily core with a few conditional rituals hanging off the end.

Daily core — every tickPhase 0World tickweather, economy,RSS distillationPhase 1Morning9 Tier-1 plan1–3 actions eachPhase 2Interactionsone Sonnet callper planned actionPhase 3Tip poolsix channels feedthe newsroomPhase 4The editorone Opus call,writes the paperPhase 5Persistevents, decisions,arcs, validationsConditional rituals — only when the calendar says soPhase 7-weekly · SundaysWeekly recaps9 Tier-1 each writetheir own weeklysummary, in voicePhase 8 · SundaysVoice judgeone Sonnet readsthe week's quotes,scores drift per voicePhase 7-monthly · month-endNation summarynarrator writes theState of the Kingdomfor the past monthPhase 6 · if env=liveStory photosGemini renders wire-styleimages for selectedarticles — runs last
One tick · 06:00 UTC · ~$2 in LLM spend on a quiet day

The whole tick runs under a BudgetTracker capped at $2-$3 per simulated day. Every model call goes through it, and every call writes a row to the call log with prompt, response, model, tokens, and cost, unredacted save for API keys.

// sim/llm/budget.ts (simplified)
class BudgetTracker {
  spentCents = 0
  capCents = SIM_DAILY_BUDGET_USD * 100

  charge(cents: number) {
    if (this.spentCents + cents > this.capCents) {
      throw new BudgetExceededError()
    }
    this.spentCents += cents
  }
}

Two dollars buys a surprising amount of monarchy.

II.The cast: thirty citizens, three tiers, nine planners

There are 30 named citizens in Almaria. They are organised into three tiers, and only the 9 Tier-1 characters plan and initiate.

The Tier-1 nine: King Juan, PM Vela, Renko (the opposition firebrand), Don Rafael (the industrialist), Cardenal Marín, V. Aldama (the Editor of El Heraldo), Don Cordoba (the patron), Ferré (the columnist), Marisol Vega (the investigative reporter). Each of them gets a morning LLM call, plans the day, and may end up the initiator of any number of planned actions.

Tier-2 and Tier-3 characters do not plan. They exist as participants. They show up in DMs and meetings, they get quoted, they have names and roles and small biographies, but they do not get their own morning routine. The reason is mundane: thirty morning calls would more than triple the daily spend, and most of that spend would be wasted on characters whose job is to say one thing in a meeting, not to drive a storyline. Twenty-one of them are scenery with names. They are still useful (the world feels populated, and the Editor can quote a junior cleric or a navy lieutenant without me having to invent one in the moment), but they do not consume planning tokens.

The events participants array follows a small convention: position 0 is always the initiator. Phase 8's DM sampler relies on that to figure out which DMs a citizen actually sent versus received.

III.Voices

This is the part of the system that has grown the most since the last time I wrote about it.

Every Tier-1 character has a Voice object in sim/voices.ts, a fingerprint of how they sound:

type Voice = {
  cadence:        string      // sentence rhythm, length, structure
  register:       'formal' | 'plain' | 'ornate' | 'terse'
                  | 'analytical' | 'literary' | 'pragmatic'
  signatureWords: string[]
  forbiddenWords: string[]
  exampleLine:    string
  quirk:          string      // the one thing that gives them away
}

King Juan writes in long sentences, prefers the conditional mood, and slips into French when he is uncomfortable. Renko is short and contraction-heavy; about half his sentences end on a rhetorical question. Cardenal Marín builds long arcs with semicolons and gets more liturgical the more pressure he is under. Marisol opens with a date or a document reference in the first sentence, every time, by design.

The fingerprint enters the system in two forms. The character themselves gets the full voiceBlock (cadence, register, both word lists, the quirk, the example line) pinned into their morning prompt. Anyone they interact with gets a voiceBrief (just cadence and signature words), so when Renko has a meeting with the Cardenal, the LLM resolving that meeting is told: this is how Renko sounds, this is how the Cardenal sounds, keep them distinct. Without that, every voice converges on a generic "thoughtful character in a political drama" register within about two weeks. With it, drift is measurable but slow.

Voices exist so that the cast does not flatten. They get judged because they will flatten anyway.

IV.What counts as a message

The events table is the one durable record of everything that happens between citizens. One row per atomic exchange, public or private:

// db/schema.ts
export const events = pgTable('events', {
  id:           serial('id').primaryKey(),
  worldDate:    integer('world_date').notNull(),
  type:         text('type').notNull(),                 // 'dm' | 'meeting' | ...
  participants: jsonb('participants').notNull(),        // citizen ids; pos 0 = initiator
  content:      jsonb('content').notNull(),             // the actual text
  isPublic:     boolean('is_public').default(false),
  witnessedBy:  jsonb('witnessed_by').notNull(),       // who saw a private exchange
  createdAt:    timestamp('created_at').defaultNow()
})

Phase 1 produces a PlannedAction per citizen, a discriminated union with seven shapes:

type PlannedAction =
  | { kind: 'dm';               to: string;        topic: string }
  | { kind: 'meeting';          with: string[];      topic: string }
  | { kind: 'public_statement'; topic: string }
  | { kind: 'op_ed';            topic: string }
  | { kind: 'submit_bill';      title: string }
  | { kind: 'leak';             about: string;     via: 'anonymous' | 'editor_network' }
  | { kind: 'none';             reason: string }

Phase 2 then resolves each planned action with a single Sonnet call, one shared model for all of them. The prompt for that call carries the initiator's voiceBlock, the recipients' voiceBriefs, the topic, and any relevant context. The output is the actual text of the DM, the meeting transcript, the speech, the op-ed, the bill summary, the leaked document, plus a tail with relationship deltas and the isPublic flag that decides who can see it.

A leak is the interesting one. The newsroom is itself a citizen id (NEWSROOM_ID), so a leak resolves into the events table as a DM to the newsroom. That is how "this got handed to the paper" gets cleanly represented without a special-case table.

V.Public, private, and the six tip channels

Every event is either public or private. Public events go straight into the day's record and into the world brief. Private ones live in the witnessedBy set: the people who were in the room and nobody else.

KindPublic?Who can see it next morning
public_statementYesEveryone, including the editor
op_edYesEveryone, including the editor
submit_billYesEveryone, including the editor
dmNoSender, recipient. ~25% chance the editor hears about it.
meetingNoAttendees only. ~25% chance of a leak.
leakNoSubmitter and the newsroom.

The Editor's view of the day's signal comes from a tip pool, built in Phase 3. Five kinds of tips, plus Don Cordoba on a separate channel:

type Tip =
  | { kind: 'marisol';        eventId: number; angle: string }
  | { kind: 'anonymous';      eventId: number; submittedBy: string }
  | { kind: 'editor_network'; eventId: number }
  | { kind: 'cordoba';         eventId: number; suggestion: 'lead' | 'kill' | 'soften' }
  | { kind: 'beat';            eventId: number; beat: 'parliament' | 'religion' | 'sports' }

Marisol's leads come out of an LLM call (she is the only Tier-1 character whose job is to feed the paper, and her tips arrive with a one-line angle). The anonymous and editor-network channels are derived from the day's leaks and from a ~25% leak rate on private events. The beat channel is derived from public events on a topic the paper covers as a beat. And Don Cordoba whispers directly to the Editor with one of three suggestions, except that the runtime currently only emits 'lead' or 'soften'. The 'kill' value is reserved; I left it in the type because it is going to be useful when I add the kill channel properly.

VI.The world brief

Continuity used to be the weakest part of the simulator. Yesterday's bombshell would vanish overnight. Today's article would talk about a strike that nobody had mentioned for a week. The system had no memory because nothing was forcing it to.

The fix is sim/world-brief.ts. A deterministic markdown digest, no LLM call, pure derivation from the past 14 days of bulletins and the past 7 days of events. It has four sections:

  1. Open storylines: the current arc titles (more on arcs in a moment).
  2. What's happening this week: the last 8 distinct headlines from the past 14 days, deduped, excluding any already linked to an arc.
  3. Most active citizens (past 7 days): top 5 by event-participation count.
  4. The Editor's recent off-print decisions: the most recent 5 Buried, Spun, or Omitted calls, with rationale.

The brief gets pinned into Phase 1 (every Tier-1 character reads it as part of their morning packet) and into Phase 4 (the Editor reads it again, alongside everything else). Cheap, debuggable, no model spend, and it is the single biggest reason the kingdom feels like it is running forward instead of resetting every dawn.

The kingdom now has a memory. Most of it is a markdown file.

VII.Story arcs

The Editor runs the paper's running stories as arcs: durable narrative threads in their own table.

// db/schema.ts
export const storyArcs = pgTable('story_arcs', {
  id:                  text('id').primaryKey(),       // kebab-case slug
  title:               text('title').notNull(),
  summary:             text('summary').notNull(),
  subjectIds:          jsonb('subject_ids'),         // string[]
  status:              text('status'),               // 'open' | 'closed'
  openedOnDate:        integer('opened_on_date'),
  lastAdvancedOnDate:  integer('last_advanced_on_date'),
  entryCount:          integer('entry_count').default(0)
})
export const arcEntries = pgTable('arc_entries', { arcId, bulletinId, storyId, worldDate })

When the Editor writes a bulletin, each article can do one of three things with arcs: open a new one (newArc: { title, summary, subjectIds }), advance an existing open arc by setting arcId, or do nothing at all. The Editor can also close arcs without writing about them today; sometimes a story just ends and you have to admit it.

There is a hard cap of 6 open arcs at a time. If the Editor wants to open a 7th, they have to close one first. This was empirically necessary; without the cap, the model would happily spin up a new arc per article for a week and then never advance any of them. The cap forces editorial judgement about what is actually still alive.

The world brief surfaces the open-arc list to everyone. The Editor's own prompt also gets a richer "Storylines you are running" block: how many days ago each arc opened, how many entries it has, when it was last advanced. So the system gently nags the Editor about the storyline they have not advanced in five days.

VIII.Weekly recaps and the State of the Kingdom

Sundays are when the system writes about itself.

After the daily core finishes on a world-Sunday, Phase 7-weekly runs a short LLM call per Tier-1 character. Each character writes their own weekly recap, in their own voice, using their own model. They get a structured digest: events they were involved in, DMs they witnessed, articles that mentioned them, and their previous week's summary as a memory anchor. Output is plain text, around 400 tokens. Stored in weekly_summaries keyed by (weekEnding, citizenId).

Two things happen with these recaps. The next week, each character's own summary becomes the "last week's recap" line in their morning packet, so they remember, in their own words, what they thought was important. And the first sentence of each summary becomes the "where each principal stands this week" rollup that the Editor sees in Phase 4. Nine one-liners, in nine voices, each one a principal's self-assessment of their week. That is the closest thing the Editor has to a briefing.

On the last day of every calendar month, Phase 7-monthly fires. A single narrator call (Sonnet) writes a State of the Kingdom digest: that month's headlines, the 9 weekly one-liners, the open arcs, the event count. Stored in nation_summaries, then pinned into the Editor's prompt for the rest of the next month as the "last month" frame.

So the Editor at any point in, say, mid-July is reading articles in awareness of June's overall shape, last Sunday's principal-by-principal rollup, the past two weeks of headlines, the open arcs, today's events, and today's tips. None of those layers are expensive. Most of them are derived. The only paid calls in the memory stack are the nine weekly recaps and the monthly narrator, and the monthly only fires twelve times a year.

IX.The Editor

The Editor's call is the most expensive thing the simulator does. One Opus 4.7 call per day, 16k-token output budget, JSON output, with articles[] and decisions[] as the two top-level keys.

The bias profile used to be a single hand-authored sentence. It is now built fresh every tick from sim/editor-bias.ts and sim/editor-mood.ts.

The hand-authored seed is still there: a patron debt to Don Cordoba (he holds the Editor's pension), a standing grudge against PM Vela ("technocrat with no instincts") and Renko ("street theatre, not politics"), soft spots for the Cardenal, the navy, and the old Almaria Vella merchants. A personal note: "You are 67, your daughter writes from London, the gout is bad in spring." That part is fixed.

What is computed per tick are two extra lines. A mood, one of weary | restless | sharp | indulgent | cautious, deterministically rotated by world-day, except it flips to cautious if the Editor has buried 3 or more stories in the past 7 days. And a recent off-prints line that counts the past week of Spun and Buried decisions and tells the Editor, in plain language, to "pull back on burying this week" if the count is heavy.

So bias has texture now. The Editor can be a sharp version of themselves on a Tuesday and a weary one on a Thursday, and the system pushes back if they over-use any one tool. The four-valued decision is unchanged (printed | spun | buried | omitted, still the unit of editorial bias), but now the unit is being audited week to week.

The Editor's full envelope, in order:

  1. The bias profile (with mood and recent off-prints).
  2. The world brief.
  3. The nation summary (last month).
  4. The "where each principal stands this week" rollup (one liner per Tier-1).
  5. The open arcs block (with days-since-opened, entry count, days-since-advanced).
  6. Today's public events.
  7. The tip pool.
  8. Don Cordoba's signals.
  9. The day's exogenous facts.

Plus, in the system message under HOUSE STYLE, a voice notes block: last week's voice-eval note for each principal, so the Editor's prose stays consistent with how each character has actually been sounding.

The output is JSON: a list of articles (headline, body, slot, optional arcId or newArc, optional quoteSources) and a list of decisions (one per topic the desk considered, with the four-valued verdict and a rationale).

{
  title: "Morning Edition · Day 47",
  articles: [
    {
      storyId: "art-47-01",
      section: "Politics · Crown",
      headline: "King calls for calm amid harbor strike",
      body: "... 600 words ...",
      arcId: "arc-harbor-strike",
      quoteSources: [12, 17]
    }
  ],
  decisions: [
    { storyId: "art-47-01", decision: "printed", rationale: "clean public event, lead-worthy" },
    { storyId: "tip-leak-22", decision: "spun",    rationale: "Cordoba asked us to soften — reframed as procedural" },
    { storyId: "tip-leak-44", decision: "omitted", rationale: "hostile to Renko but unverifiable; spike" }
  ]
}

X.Quote validation

Once the Editor has written articles, every direct quote in every article body has to trace back to something. Each article carries a quoteSources array (event ids drawn from today's public events or the tip pool), and after Phase 4 parses the bulletin, every article runs through validateQuotes in sim/quote-validator.ts.

The check is deliberately forgiving. Any quoted span of 3 or more words is folded against the cited source bodies; near-matches pass. Anything that does not fold-match any cited source gets logged. Warnings flow into Phase 5 and write rows to quote_validations.

This is a soft fail. The article still prints. The desk just flags it, and the flag shows up on /mechanism. The Editor's prompt has a QUOTE ATTRIBUTION section spelling out the rule, and over time the unattributed-quote rate has dropped from "alarming" to "occasional", which is about where I want it. A real paper has the same problem.

XI.Voice drift, judged

Every world-Sunday, after the weekly recaps run, Phase 8 fires. One judge call, Sonnet in the narrator role, reads a sample of the week's quotes and DMs from each Tier-1 character, compares them against the canonical voiceBlock, and scores drift on a 0-1 scale.

The output is one row per citizen in voice_evals:

// db/schema.ts
export const voiceEvals = pgTable('voice_evals', {
  weekEnding: integer().notNull(),
  citizenId:  text().notNull(),
  pass:       boolean().notNull(),
  drift:      real().notNull(),                    // 0..1
  note:       text().notNull(),                    // short prose, one or two sentences
  exemplars:  jsonb().notNull(),                   // quotes that did sound like the character
  violations: jsonb().notNull()                    // quotes that did not
})

The VoiceDriftWidget on /mechanism shows the latest week. Over the last three months, Renko has been the most drift-prone (his contractions slowly disappear if nothing pulls them back), and Marisol has been the most stable (the date-or-document opener is a structural tell, hard for the model to forget).

There is also, gated behind VOICE_SELF_CORRECTION=on, a feedback loop. When the env var is set, the most recent voice-eval note per citizen is fed back into Phase 1 and Phase 2 as a "last week's note" line in the prompt. The system can, when I let it, try to correct its own drift. It is off by default. This is the kind of loop that is interesting to watch in a controlled run and dangerous to leave running unattended for a month: the model can over-correct, and you end up with a Renko whose every sentence ends in a question mark because that was the note three weeks ago.

Drift is the price of running for a long time. The judge is the price of measuring it.

XII.The model line-up

Nine LLMs for the principals, plus three utility roles. All of it routed through OpenRouter so I can swap models without touching the simulator.

RoleModel
King JuanAnthropic Sonnet 4.6
PM VelaOpenAI GPT-5
RenkoLlama-4 Maverick
Don RafaelxAI Grok-4
Cardenal MarínCohere Command-A
V. Aldama (Editor)Anthropic Opus 4.7
Don CordobaDeepSeek v3.2
FerréMistral Large
Marisol VegaGoogle Gemini 2.5 Pro
Phase 2 interactionsshared Sonnet
Distiller (RSS)Sonnet
Narrator (monthly + voice judge)Sonnet

The Editor's Opus call is by a wide margin the most expensive thing in the tick, usually 40-55% of the day's spend on its own. Phase 2 is the next biggest line item, since it scales with the number of planned actions.

Picking nine different model families is partly aesthetic and partly practical. Aesthetic: I wanted the principals to sound different in a way that was not just prompt-engineered. A King written by Sonnet does, in fact, read differently from a King written by GPT-5, even with the same voiceBlock. Practical: I wanted to know what was tractable across providers, and running this thing daily for months turns out to be a very honest evaluation.

XIII.Who knows what

There are three readers of the events table, and they each see a different slice.

A citizen, in their morning routine, sees: any public event from the past 14 days, any private event they participated in or witnessed, their own previous week's recap, the world brief, the nation summary, the day's exogenous facts, and (if the env var is on) their last voice-eval note. They do not see other citizens' DMs, other citizens' weekly recaps, the tip pool, or the Editor's bias profile.

The Editor, in Phase 4, sees the full envelope laid out above: the world brief, the nation summary, the principals rollup, open arcs, today's public events, the tip pool, Cordoba's signals, exogenous facts, and the voice notes block. They do not see private DMs unless those DMs surfaced as tips. They do not see the inside of any Tier-1 character's morning prompt.

A reader of El Heraldo sees only what got printed. That is the published bulletin and nothing else: no decisions, no spiked stories, no buried context, no idea that Don Cordoba whispered "soften" three days running.

There is a fourth view: a reader with Backstage on (the toggle in the masthead) sees the bulletin alongside the decisions table, the arcs board, the world brief, the voice-drift widget, and the call log. That is the view that exists for me and for the curious: the whole apparatus, not just the front page.

XIV.Why this shape

Almaria did not start out as a study in editorial bias. It started as an excuse to run nine LLMs against each other and see what they would do. The answer, at first, was: not much. They had no memory, no consistent voices, no incentive to react to each other across days. The world reset every morning.

Most of what is in this essay is the answer to that. The world brief is memory. Voices are continuity of character. Arcs are continuity of story. Weekly recaps and the monthly narrator are continuity of self-reflection. The voice judge is the auditor that tells me when any of that is breaking down. The Editor's mood and recent-off-prints lines are the auditor that tells the Editor when they are breaking down.

The architecture is not elegant. The phase numbers are out of order: Phase 6 runs last, Phase 7 has a weekly half and a monthly half, Phase 8 only fires on Sundays. The interaction Sonnet is one model wearing nine masks because nine more model calls was not in the budget. Cordoba's 'kill' channel exists in the type and not in the runtime. Voice self-correction is gated off because I do not trust it unattended. These are all small warts and I am leaving them in because the alternative is a refactor that does not change what comes out of the paper.

The thing that has not changed since the first version is the gap: between what thirty named citizens privately do in a day and what the country publicly reads about it the next morning. Everything new (the voices, the arcs, the recaps, the moods) exists to give that gap texture over time. The country has a memory now, the Editor has moods, the characters drift and get judged for drifting. The paper still goes out at dawn, and the readers still only see what got printed.

Most of the bias in a real newsroom is not in any one decision; it is in the accumulated weight of small ones, made by tired people with patrons and grudges and Sunday rollups of what their colleagues said this week. The simulator runs that weight on a clock you can see. The clock ticks once a day, and it costs about two dollars.

It also runs in the open. Every prompt, every response, every off-print decision lands in the call log, and the call log is something anyone can read with Backstage on. That was not really a design choice; it was the only version of this that felt honest. A real paper keeps the spiked stories, the patron's nudges, and the editor's bad week behind a closed door. Almaria does not have that door. If you want to know why an article got buried on a Thursday in April, the rationale is right there next to the article, in the model's own words. I find I learn more from those rows than from the bulletin most days.