Figma → Code, The Hard Way

2026, May 08    

Most “Figma to code” tools are vertical: pick a stack (usually React or Flutter), hard-code the mapping, ship a plugin. They work — until your codebase isn’t React or Flutter, or until your team’s component library, naming conventions, and design tokens don’t match the canned ones.

Over the past few months I built a different shape of the same tool: a stack-agnostic agent that treats the target framework as an injection point, paired with a desktop client that uses the agent itself as its backend. This post walks through both halves — the agent skill design and how it adapts to arbitrary stacks (Section 1), and the desktop UI that streams lifecycle events back from the agent and never opens a single REST endpoint of its own (Section 2).

The agent code lives in devhub_devharness/agents/figma2code, the desktop client lives under project/desktop/.../FigmaToCode. Diagrams below are rendered with Mermaid; the loader is injected at the bottom of this post.

Section 1 — The agent layer

Why three skills, not one

The agent exposes three independently runnable skills:

Skill Idempotent over… What it touches
generic-design-init a workspace (once per project) writes <workspace>/.hx/figma2code/ — config, adapters, schema docs
generic-design-parser a Figma URL (once per snapshot) writes <workspace>/.hx/figma2code/output/<framework>/<timestamp>/
generic-code-gen a snapshot + change scope edits the workspace in place

Splitting them isn’t bookkeeping — it’s a dependency contract. Parser and code-gen both assume the workspace is initialized; they validate that assumption with a validateF2CWorkspace tool call at boot, and refuse to run if config.json is missing or its schemaVersion drifted from the code-side SCHEMA_VERSION. There is no graceful-degradation path. Either the workspace is in a known-good shape, or you go re-run init.

This is unfashionable for an LLM tool — agents love to recover from missing files by guessing — but the alternative is silent corruption of someone’s actual project. The cost of “fail loud” is one extra hop; the cost of “be helpful” is a rewrite that doesn’t compile.

The agent is also VCS-aware from the very first skill. Init asks the user which version-control system the project uses — strictly one of git / p4v / none, no free-form input — and the answer rides along through config.json’s vcs field. Two things follow from that:

  • Perforce credentials get a separate, gitignored home. When the user picks p4v, init collects P4PORT / P4USER / P4CLIENT, writes them to <workspace>/.hx/figma2code/.p4config in standard P4CONFIG format, and idempotently appends that path to <workspace>/.gitignore. The credentials never enter config.json, never get committed, and never appear in any prompt the agent sends to the LLM. Only the fact that vcs is p4v is public; the keys themselves stay local.
  • Code-gen edits in place so the user’s existing review tooling owns the diff. The skill never opens a branch, never stages a commit, never runs git add. It writes files; the user sees them as M / ?? / A in their normal git status or P4 changelist. The desktop’s file tree, in turn, runs a one-shot GitChangeScanner / P4VChangeScanner after each agent turn and color-codes nodes by ChangeKind so the user can visually verify what the last skill touched. The agent and the version-control system stay in their respective lanes — the agent owns what changed, the VCS owns what gets committed.
flowchart LR A["generic-design-init
(once per workspace)"] --> B["generic-design-parser
(once per snapshot)"] B --> C["generic-code-gen
(once per change scope)"] A -.writes.-> W["<workspace>/.hx/figma2code/
config.json + adapters + schema doc"] B -.writes.-> S["output/<framework>/<timestamp>/
semantic-ast.json + screenshots/ + image-map.json"] C -.edits in place.-> P["target project files"] W -.gates.-> B W -.gates.-> C S -.consumed by.-> C style W fill:#fff7d6,stroke:#aa9 style S fill:#e6f4ff,stroke:#79a style P fill:#dff7e0,stroke:#7a8

The “generic flow + workspace adapter” pattern

The whole architecture rests on a single observation: almost every step in a Figma-to-code pipeline is framework-independent. Fetching a node tree, walking it, exporting screenshots, deduping images, normalizing colors to RGBA — none of those care whether you ship Flutter or SwiftUI. What does care is a small set of decisions clustered at the boundaries:

  • How does a {r: 1, g: 0.5, b: 0, a: 1} color render as a string? Color(0xffff8000) for Flutter, rgb(255, 128, 0) for CSS, #FFFF8000 for Avalonia.
  • What does layoutMode: HORIZONTAL map to? "row" in Flutter, "row" in Avalonia, flex-direction: row in CSS.
  • What’s a vector group called when you flatten it? SvgPicture in Flutter, Image (with svg source) in Compose, <svg> in web.

So the parser ships a strict interfaceFrameworkAdapter — and asks each workspace to provide its own implementation:

export interface FrameworkAdapter {
	transformColor(rgba: RGBA): string;
	layoutMaps: LayoutMaps; // direction / mainAxisAlign / crossAxisAlign / mainAxisSize
	mapNodeType(figmaType: string, ctx: NodeTypeContext): string | undefined;
	svgPictureType: string; // must equal mapNodeType('VECTOR', ...)
	staticImageType: string; // must equal mapNodeType('IMAGE',  ...)
	postAstTokenGenerated(astPath: string): Promise<void>; // lifecycle hook
}

The adapter file lives at <workspace>/.hx/figma2code/extensions/generic-design-parser.ts. The parser dynamically import()s it at runtime and validates the five required fields before doing anything. Three things make this work in practice:

  1. The init skill writes the adapter, not the user. When a workspace is first initialized, the agent reads the project’s actual code (pubspec.yaml, *.cs, whatever), figures out the stack, and generates the adapter from a template + the workspace’s existing conventions. The user gets a working file they can later hand-edit; they don’t start from a blank page.
  2. The AST schema doc is rendered from the adapter. A second template, astSyntaxAutoGenerated.md, has 17 placeholders (,, ``, …). The init skill calls each adapter method and substitutes real values. Code-gen later reads this file as the authority on what type: "Row" means in this workspace — never the parser source code, never a hard-coded table.
  3. A second adapter exists for code-gen. generic-design-gen.ts exposes one hook, postCodeGenFinished(isSuccess, message), called after the skill finishes writing files. The default implementation is empty; teams use it to kick off flutter run, trigger a hot reload, or run dotnet build for a smoke check. It’s a deliberate echo of postAstTokenGenerated — a single, named lifecycle hook the user can specialize without touching the skill internals.

The trick is pushing all framework knowledge into two ~150-line TypeScript files that the workspace owns, and keeping the skill itself a generic walker.

Figma API and the Design Token bridge

The parser pulls three things from Figma’s REST API:

  • GET /v1/files/:key — the document tree, with boundVariables references inlined.
  • GET /v1/images/:key?ids=... — PNG renders for vector groups, image fills, and any node we intend to render as an asset.
  • GET /v1/files/:key/variables/local — the Design Variables system, which is Figma’s first-class token store (colors, numbers, strings, booleans, with mode-aware values).

The third one is the interesting one. Figma’s Variables API gives you typed tokens grouped by mode (light/dark, brand A/brand B, …), and the document references them via boundVariables. A button fill that says boundVariables.fills[0] = { type: "VARIABLE_ALIAS", id: "VariableID:42:1" } is telling you “this fill is whatever Color/Primary resolves to” — not a hard-coded #3FD47A.

The parser turns this into a workspace-side design-variables.json with two passes:

1. Resolve aliases: chase VARIABLE_ALIAS chains up to 6 hops, write the leaf value back
   into the AST node's color field directly. (No more aliases in semantic-ast.json.)
2. Emit the token table: flatten all variables into { name, type, mode, value } records,
   route value through adapter.transformColor for COLOR types so the output is already
   in target-stack literals.

Why resolve aliases at parse time instead of at code-gen time? Because the rendering phase needs concrete colors to do the screenshot dedup correctly (two nodes referencing the same alias should produce identical bitmaps), and because the resulting AST is then trivially consumable by any downstream tool — including non-LLM tools — that doesn’t speak Figma’s variable schema.

The code-gen skill consumes design-variables.json directly and is expected to map tokens to the target stack’s idiomatic representation: a ThemeData extension in Flutter, a static class of Color constants in Avalonia, a CSS :root { --color-primary: ... } block in web. The mapping is the agent’s job during code-gen — not the parser’s. The parser only resolves and flattens.

flowchart TB subgraph Figma["Figma REST"] F1["GET /files/:key"] F2["GET /images/:key"] F3["GET /files/:key/variables/local"] end subgraph Parser["generic-design-parser run-all.ts"] P1["fetch-design.ts
resolve VARIABLE_ALIAS chains"] P2["export-screenshots.ts
scale=2, PNG"] P3["to-semantic-ast.ts
walk tree → adapter.mapNodeType"] P4["extract-material.py
md5 + perceptual hash dedup"] end subgraph Snapshot["<snapshot>/"] O1[design-context.json] O2[screenshots/*.png] O3[semantic-ast.json] O4[design-variables.json] O5[image-map.json] end F1 --> P1 --> O1 F2 --> P2 --> O2 F3 --> P1 P1 --> P3 --> O3 P3 --> O4 P2 --> P4 --> O5 P4 -.rewrites.-> O3 P3 --calls--> A["adapter.transformColor
adapter.mapNodeType
adapter.layoutMaps"] style A fill:#fde2e2,stroke:#a55

Screenshots and the dedup that’s older than your assets folder

Figma documents repeat themselves. The same close-icon is dropped into 30 screens; the same “Preview” thumbnail gets reused across cards; designers paste-and-tweak instead of componentizing. A naive parser would emit 30 distinct PNG assets for the same icon, and the LLM doing code-gen would happily register all 30 with auto-generated names that nobody can search for later.

The dedup pass (extract-material.py) does three things:

  1. Find candidates. Walk the AST, collect every node whose type is in (adapter.svgPictureType, adapter.staticImageType) — i.e., everything that will render as an image asset under this workspace’s stack.
  2. Cluster by similarity. Default is file_hash (byte-level MD5) which catches verbatim duplicates. Optional is imagededup’s phash (perceptual hash, recommended), which catches “same icon, slightly different export pixels” — a real failure mode when designers re-export sub-pixel-aligned vectors. The threshold (0.95 default for phash, equivalent to Hamming distance ≤ 6 on the 64-bit hash) is the empirical sweet spot we landed on after chewing through a few real design files.
  3. Rewrite the AST. Emit image-map.json keyed by <categoryPrefix><TypeName><NameSuffix>unionIcon, areaRectangle184, etc. — and rewrite each AST node’s screenshot field from a filesystem path to that key. The map keeps { snapshot: "screenshots/...", md5: "..." } so downstream code-gen can do another cross-workspace dedup against existing assets in lib/assets/ or Assets/ before writing.
Method Speed Recall When to use
file_hash Fastest Low When designers always re-export the same source
phash Fast High Default. Robust to sub-pixel rendering noise
cnn Slow Highest Visually similar but structurally distinct icons

The image-map keys are deliberately not meant for human consumption — they’re stable identifiers the AST uses to point at assets. When code-gen actually writes a file into assets/, it picks a business-meaningful name (user_avatar.png, close_icon.svg) based on the node name, the screenshot, and the snapshot’s guide.md. The split keeps the parser deterministic (same input → same keys) while the agent gets to make the human-readable naming decision once, in context.

The details that don’t fit a flowchart

Some of the more useful design choices don’t show up as boxes:

  • schemaVersion is a wall, not a hint. Every config file written under .hx/figma2code/ carries a schemaVersion. The code-side SCHEMA_VERSION constant lives in extensions/schema-version.json. On every parser/code-gen entry, validateF2CWorkspace compares them; mismatch returns a Failure envelope whose message tells the user verbatim: “delete .hx/figma2code/ and re-run generic-design-init”. The agent passes that message through unchanged. It does not offer to migrate. Migration is a class of bug we don’t take on; users are better off rebuilding workspace state than debugging a half-migrated one.

  • Cache pruning is a tool call, ordered before status sync. Each parser run leaves a new <timestamp>/ snapshot. Without intervention output/<framework>/ would balloon. The skill ends with two ordered tool calls: cleanF2CCache (keep last 10 snapshots, repoint latest.txt) before syncF2CStatus. The order matters — flipping it would briefly leave status.json referencing a snapshot the next call is about to delete.

  • The lint pass is non-blocking on purpose. After the AST is built but before status sync, the skill prints an AST lint report: nesting depth, redundant single-child wrappers, repeated topology that should have been a Component, hard-coded colors that bypass Variables. It writes nothing, blocks nothing, and refuses to “fix” anything. Designers iterate; if the report flags something, the human decides whether to clean the Figma file and re-parse, or live with the current AST. Quality nudges, not quality gates.

  • Code-gen has a hard scope-confirm loop. Before any file write, generic-code-gen produces a markdown checklist (create / modify / delete / asset, with rationale and node references) and asks the user approve / modify / cancel. Sit in the loop forever if you must. There is no timeout, no “default approve”, no “I think this is what you meant” path. When the LLM tries to wander outside the approved scope mid-write, the rule says: stop, go back to the loop, get a fresh approval. This is the one place where the agent’s normal enthusiasm gets actively suppressed.

  • The event stream is its own protocol. notifyF2CEvent is a registered tool whose payload is a { code, message, payload: { event, data? } } envelope, written into a normal tool_execution_end channel. There are 26 event literals across the three skills, each one a kebab-case string the desktop client decodes into a typed enum. We didn’t invent a new socket — we stole a tool slot and used it as a status bus. More on that in Section 2.

generic-code-gen, where the AST goes back to being source code

The parser is upstream; code-gen is the downstream consumer that turns its outputs into actual files in the user’s project. The skill’s contract is unusually narrow: read seven things, write only inside an approved scope, and call one lifecycle hook on the way out.

Its inputs split cleanly into two tiers — snapshot-scoped (what this design says) and workspace-scoped (how this project speaks):

Tier File Role at code-gen time
Snapshot <snapshot>/guide.md The --description the user gave at parse time. Names the task (“product detail v3”), used to disambiguate when the AST has multiple frames and to seed commit-message-grade context.
Snapshot <snapshot>/semantic-ast.json Primary structure: nodes, layout, inline style, text. Every type literal here is interpreted via the workspace AST schema doc, not parser source.
Snapshot <snapshot>/design-variables.json Resolved Design Tokens. Code-gen maps these to the target stack’s idiom — ThemeData extension in Flutter, :root { --color-* } in CSS, static Color constants in Avalonia.
Snapshot <snapshot>/screenshots/*.png Visual cross-check. The agent reads them when AST-only inspection is ambiguous (e.g. “is this gap really 16, or am I miscounting padding?”).
Snapshot <snapshot>/image-map.json key → { snapshot, md5 }. AST nodes’ screenshot fields point at these keys; the md5 powers the cross-workspace asset dedup discussed below.
Workspace <workspace>/.hx/figma2code/config.json Source of truth for language / framework. Code-gen does not infer the stack from project files — the init skill already decided.
Workspace <workspace>/<config.guidePath> (default guide.md) Project-level long-form description (tech stack, accessibility goals, naming preferences). Read on every run as the long-term baseline; the snapshot guide is the short-term overlay.
Workspace <workspace>/<config.astSyntaxPath> (astSyntaxAutoGenerated.md) The only authority on what every type / enum / color literal in the AST means in this workspace’s stack.
Workspace <workspace>/<config.codingStandardPath> Coding conventions — either the user-provided file or the init-fallback codingStandardAutoGenerated.md. Drives naming, file layout, dependency style.
Workspace <workspace>/.hx/figma2code/extensions/generic-design-parser.ts The same FrameworkAdapter the parser used. Code-gen reads it for the source-of-truth color/layout/type mappings rather than re-deriving them.
Workspace <workspace>/.hx/figma2code/extensions/generic-design-gen.ts The code-gen adapter. Exactly one hook: postCodeGenFinished(isSuccess, message). Empty by default.

The execution itself is twelve named steps in SKILL.md, but the ones that actually shape behavior are these:

  1. Validate, then load both guides. validateF2CWorkspace first — same schemaVersion wall as the parser. Then read the workspace-level guide.md (long-term baseline) and the snapshot-level guide.md (this task’s intent). When the agent has to decide “which frame in this AST is the user actually asking me to render?”, these two together resolve it.
  2. Resolve types through the schema doc, not the parser. Every time the agent sees "type": "Row" in the AST, it looks up Row in astSyntaxAutoGenerated.md to find the target-stack widget name, the layout enum literals, and the color-literal example. This is why the schema doc is rendered at init time from the adapter — it’s the contract code-gen will read at every invocation, never the parser source.
  3. Plan first, write later, on the user’s say-so. Before any file write, the agent produces a markdown checklist (create / modify / delete / asset per row, with the workspace-relative path, the rationale, and the source AST node IDs) and presents it via AskUserQuestion with three buttons: approve, modify, cancel. The skill loops on modify indefinitely — there’s no timeout, no “I think this is what you meant” silent approval, no implicit upgrade of cancel to approve after N rounds. If, mid-write, the agent realizes it needs to touch a file outside the approved scope, it must stop and go back to step 3 with a fresh checklist. The hard gate exists because the LLM’s natural instinct — “fix while you’re in the area” — is exactly the failure mode that turns code-gen runs into unreviewable churn.
  4. Write in place, into paths the user named. No staging directory, no shadow copy, no pull-request branch. The file the agent edits is the file the user opens in their editor; the diff the user reviews is the diff their VCS shows them. “Place” matters — the agent is forbidden from touching anything outside the approved scope (item 3 enforces this).
  5. Asset dedup against the workspace, not just the snapshot. The parser’s image-map.json already deduped within a snapshot. Code-gen does a second pass: before writing a new asset into assets/ or wherever, it MD5s every existing image file in the workspace and compares against the md5 field of each image-map entry. A match means “reuse the existing path, don’t write”. Only fresh hashes get a new file, and only at code-gen time does the agent pick a human-readable filename (user_avatar.png, close_icon.svg) using the screenshot, the AST node name, and the snapshot guide — the parser’s unionIcon / areaRectangle184 keys are stable identifiers, never user-facing.
  6. Sync, then run the post-hook, then announce done. The last three calls are syncF2CStatus (writes status.json so the desktop can re-render off filesystem state) → await adapter.postCodeGenFinished(isSuccess, message) (the workspace’s chance to kick off flutter run, trigger a hot reload, run a smoke build) → notifyF2CEvent code-gen-done. The hook can throw — the agent catches and continues to done regardless. Failures in the post-hook are workspace concerns, not skill concerns.
flowchart TB subgraph Snap["<snapshot>/ (parser output)"] S1[guide.md] S2[semantic-ast.json] S3[design-variables.json] S4[screenshots/] S5[image-map.json] end subgraph WS["<workspace>/.hx/figma2code/"] W1[config.json] W2[guide.md] W3[astSyntaxAutoGenerated.md] W4[codingStandard...] W5[generic-design-parser.ts] W6[generic-design-gen.ts] end subgraph Gen["generic-code-gen execution"] G1["validate workspace
load both guides"] G2["interpret AST
via schema doc"] G3["plan checklist
approve / modify / cancel loop"] G4["write in place
asset md5 dedup vs workspace"] G5["syncF2CStatus
+ postCodeGenFinished hook"] end P[(target project
files in place)] S1 --> G1 W1 --> G1 W2 --> G1 S2 --> G2 W3 --> G2 W5 --> G2 S3 --> G2 G2 --> G3 S4 --> G3 W4 --> G3 G3 --approve--> G4 S5 --> G4 G4 --> P G4 --> G5 W6 --> G5 style G3 fill:#fde2e2,stroke:#a55 style P fill:#dff7e0,stroke:#7a8

Notice what’s not in the loop: the agent never re-fetches Figma, never re-runs the parser, never tries to “refresh” anything. The snapshot is the immutable source-of-truth for this generation; if the design changed, the user runs the parser again, which produces a new snapshot, which becomes the input to a new code-gen invocation. That immutability is what makes the skill safely re-runnable on the same snapshot — every call sees the exact same inputs and produces the exact same checklist (modulo LLM nondeterminism), so the user can iterate on the scope (“actually, also touch the theme file”) without worrying that the underlying design data is shifting under them.

Section 2 — The desktop layer: agent-as-backend

Most desktop apps that “talk to an LLM” treat the model as one capability among many, behind a service layer that also hits a database, a file system, a remote API. The FigmaToCode workload is more extreme: the agent is the entire backend. There is no FigmaService, no SnapshotService, no REST proxy. Every meaningful state change in <workspace>/.hx/figma2code/ happens because the agent ran a skill. The desktop’s job is to (a) gather inputs, (b) hand them to the agent, (c) watch the workspace, (d) re-render.

This section walks through that loop, the contracts that hold it together, and a few specific design merits that earn their keep.

The 30-second tour of the data flow

sequenceDiagram autonumber participant U as User participant V as FigmaToCodeViewModel participant B as AgentContextBuffer participant C as IAgentController participant A as Agent (figma2code) participant D as AgentDeltaParser participant P as ProgressIndicator participant W as Workspace fs U->>V: Submit "Import snapshot" V->>B: AppendRange(figma2code.parser.* entries) V->>P: Show(sequence: DesignParser) V->>C: AsyncInputMessage(triggerPrompt) Note over C,A: framework calls IContextProvider.AsyncQueryContext() C->>B: Flush() → list of ContextEntry C->>A: deliver context + prompt loop for every lifecycle phase A-->>D: stream delta with notifyF2CEvent envelope D-->>V: F2CAgentEnvelope V->>P: UpdateEvent(F2CAgentEvent) end A-->>W: writes config / snapshot / code (in place) A-->>C: stream end C-->>V: AsyncUpdateContext(...) V->>W: re-scan .hx/figma2code/ V->>P: Hide()

A few things are doing a lot of work in that diagram.

The two-track parameter contract

When the user clicks “Import snapshot”, the view-model has both a structured request object (SnapshotImportRequest { WorkspacePath, FigmaUrl, Alias, Description }) and a free-text triggerPrompt for the agent. The desktop sends both, on two tracks:

Track Payload Consumer
ContextEntrys figma2code.parser.workspace_path = "/Users/.../proj", etc. The agent’s “read context” path — keys are documented in SKILL.md.
triggerPrompt A natural-language message listing the same params as bullet points. The agent’s IAgentController.AsyncInputMessage text channel.

Why both? Because the LLM might preferentially read either, depending on its tuning. The SKILL.md spells out a “context-first, prompt-second” rule: if every required key is present and non-empty in ContextEntry, the agent must not ask the user for it again via AskUserQuestion. The trigger prompt is a fallback narration in case the agent ignores the structured channel; it’s also a hint to the agent about which skill to invoke. The two tracks are redundant on purpose — defensive design against an LLM that happens to glance at one channel and not the other.

// Service/AgentRequestService.cs (excerpt)
List<ContextEntry> entries = [
    Attr("figma2code.parser.workspace_path", request.WorkspacePath),
    Attr("figma2code.parser.figma_url",       request.FigmaUrl),
    Attr("figma2code.parser.alias",           alias),
    Attr("figma2code.parser.description",     description),
];
_contextBuffer.AppendRange(entries);

string prompt =
    L.Get("F2C_Agent_Prompt_ParserHeader") + "\n" +
    L.Format("F2C_Agent_Prompt_LabelWorkspace", request.WorkspacePath) + "\n" +
    L.Format("F2C_Agent_Prompt_LabelFigmaUrl",  request.FigmaUrl) + "\n" +
    L.Format("F2C_Agent_Prompt_LabelAlias",     alias) + "\n" +
    L.Format("F2C_Agent_Prompt_LabelDescription", description);

await controller.AsyncInputMessage(BuildRequestMessageJson(prompt));

The header sentence ("Trigger skill:generic-design-parser; arguments below are complete...") is localized so the agent answers in the user’s UI language. The keys themselves (workspace, snapshot, alias) stay ASCII across all locales — the agent has been tuned to recognize them as parameter names, not vocabulary. UI culture flips the prose, not the keys.

The buffer + flush contract

AgentContextBuffer is twenty lines of code and pulls a surprising amount of weight:

internal sealed class AgentContextBuffer {
    private readonly object _gate = new();
    private readonly List<ContextEntry> _entries = new();

    public void AppendRange(IEnumerable<ContextEntry>? entries) { /* lock + add */ }

    public List<ContextEntry>? Flush() {
        lock (_gate) {
            if (_entries.Count == 0) return null;
            var snapshot = new List<ContextEntry>(_entries);
            _entries.Clear();
            return snapshot;
        }
    }
}

The shape of this object encodes a real invariant: context is delivered exactly once, on the host’s terms. The view-model Appends. The agent host calls IContextProvider.AsyncQueryContext right before sending a request. That call is what flushes the buffer; nothing else can read it. There is no cache, no history, no replay. This means:

  • Multiple Submit*Async calls before a flush get coalesced. If a user fires a parser request and then a code-gen request before the agent has actually started, both context sets land in the same agent turn. (In practice the UI prevents this, but the buffer doesn’t have to know.)
  • If the user cancels mid-flight (RequestStopAgentAsync), the buffer is flushed and discarded manually. The next agent turn starts clean.
  • The locking is intentionally a single-mutex over a List<>. We measured: payloads are tens of entries, calls are rare. A ConcurrentQueue would have made Flush non-atomic, which is the one operation that has to be all-or-nothing.

The delta channel: streaming progress without inventing a protocol

The agent emits 26 distinct lifecycle events. The desktop wants to render them as “stage 4 of 9, currently parsing AST…” without polling. The trick is that pi (the agent host) already has a tool_execution_end channel that streams every tool call result back to the client. So we make event-reporting itself a tool:

// agents/figma2code/extensions/tools/notify_f2c_event.ts (sketch)
registerTool('notifyF2CEvent', async ({ event, code, message, data }) => {
	validateAgainstSchema(event, code, data);
	return JSON.stringify({
		code,
		message,
		payload: code === 0 ? { event, data } : {}
	});
});

The agent calls notifyF2CEvent whenever it enters a new phase. The tool’s result is the envelope. The desktop catches it on the way out:

// AgentDeltaParser.cs — runs on every stream delta
public IReadOnlyList<F2CAgentEnvelope> Parse(string? delta) {
    // 1. Parse delta as JSON, look for tool_call array.
    // 2. For each entry, track id → name across start/update/end (hx client doesn't
    //    re-emit toolName on later phases).
    // 3. When type == "end" && toolName == "notifyF2CEvent", deserialize the output
    //    string into F2CAgentEnvelope and yield it.
}

Then FigmaToCodeViewModel.OnAgentDeltaAsync walks the envelopes and Posts them to the UI thread, where AgentProgressIndicatorViewModel.UpdateEvent(F2CAgentEvent) recomputes the step list. Each skill has a fixed sequence (F2CAgentEventSequences.DesignParser, ...CodeGen, ...Init); the indicator just IndexOfs the current event and marks everything before it Done, the current one InProgress, the rest Pending.

There’s one fun detail: when the user types into the chat directly instead of clicking a button, the desktop doesn’t know which skill will run. So IContextProvider.AsyncQueryContext checks: if the progress indicator isn’t visible yet, open it with a generic AgentRunning sequence (one step, “Running…”). The first real lifecycle event that arrives — say, design-parser-fetching-design — triggers UpdateEvent, which detects the sequence change (InferSequenceFromEvent) and swaps the entire step list to the right skill’s sequence. The transition is lossless and seamless.

stateDiagram-v2 [*] --> Hidden Hidden --> Generic: AsyncQueryContext (chat path) Hidden --> SkillSpecific: Submit*Async (button path) Generic --> SkillSpecific: first F2CAgentEvent arrives SkillSpecific --> SkillSpecific: UpdateEvent(next) SkillSpecific --> Hidden: AsyncUpdateContext (skill done) SkillSpecific --> Hidden: RequestStopAgentAsync (user cancel) note right of Generic sequence = [AgentRunning] single step "Running…" end note note right of SkillSpecific sequence = Init / DesignParser / CodeGen steps drive ▶ ✓ ◦ glyphs + label colors end note

The view-model boundary: three interfaces, no implementations leaked

The UI assembly references the view-model assembly for one reason only: to know what shape the data context has. It does this through three small interfaces:

Interface Role
IFigmaToCodeViewModel Observable surface: InitializationStatus, ProjectBrowser, SnapshotSection, the Submit*Async methods.
IFileTreeViewModel Defined in Nut.Base.Gui, implemented by ProjectBrowserViewModel. Drives the file tree.
ICodeEditorViewModel Same — drives the read-only code viewer. Same VM implements both, so click-to-open is automatic.
IAgentDeltaSink Lets the host plug the delta stream into us via OnAgentDeltaAsync(string).

The XAML side never imports a class from the view-model assembly. Bindings are typed via x:DataType="abs:IFigmaToCodeViewModel", the file tree is a nb:FileTreePanel whose DataContext is just object? from the interface’s perspective, and the runtime resolves the actual implementation. This means we can refactor the view-model freely (split classes, change namespaces, swap CommunityToolkit.Mvvm for something else) without recompiling a single XAML file.

The [ObservableProperty] source generator does the boilerplate; one-way bindings cover most of the surface, and the few Mode=TwoWay bindings (workspace path input, P4 fields, snapshot selection) flow back through the same property setters that drive Submit*Async.

File tree, snapshots, and the watch-the-workspace pattern

When the agent finishes, it leaves files behind. The desktop notices via two paths:

  1. IContextProvider.AsyncUpdateContext is called by the host at the end of every agent turn. The view-model uses this as the canonical “rescan now” signal: re-load WorkDataModel (which checks for config.json / status / framework dir), refresh SnapshotSectionViewModel.Refresh(), refresh the project browser’s VCS scan, hide the progress indicator. One synchronous-looking method, all the workspace state-sync work funneled through it.
  2. The IFileTreeViewModel change scanner (GitChangeScanner / P4VChangeScanner) is deliberately not an inotify watcher. It’s a one-shot scan triggered by the same AsyncUpdateContext, plus user-driven refresh. Watchers are noisy (compile output, gitignored IDE files, cache flushes), and we’d rather rescan once after a known event than chase thousands of false positives. The scanner produces ChangedFile records with ChangeKind ∈ { Added, Modified, Deleted }, which color-code the file tree nodes (green / yellow / red dots) so the user can visually verify what the last skill touched.

The snapshot list works the same way: SnapshotLoader does pure disk IO (read design-context.json, count screenshots, count image-map.json keys) on a background thread, posts results back to the UI thread, and ListBox redraws. No background polling, no file watcher — re-render happens when the user did something or when the agent finished something.

Localization: prose flips, keys don’t

A small but earnest detail. The same AgentRequestService that builds trigger prompts for the agent uses the workload’s resx files (Strings.{en,zh-CN,ja,ko}.resx). Two policies coexist:

  • User-facing sentences (header lines, fallback descriptions, the “please explain…” template) are translated. The agent ends up answering in the same language the UI is in, because its trigger prompt arrived in that language.
  • Machine-readable keys (workspace, snapshot, alias, vcs, p4 port, figma url, description, change scope, change kind, change range) stay ASCII across every locale. The agent has been tuned to recognize them as parameter names; flipping them to translated Chinese / Japanese / Korean would break key recognition without buying anything for the user.

So the resx for F2C_Agent_Prompt_LabelWorkspace is "- workspace: {0}" in all four files. The header above it (F2C_Agent_Prompt_ParserHeader) is fully translated. This is the only sane split: localize what the user reads, freeze what the LLM parses.

What pays for itself

A few design merits on the desktop side that I’d port to the next agent-backed app without hesitation:

  • The agent is the only writer. The desktop reads <workspace>/.hx/figma2code/ and watches it; it never writes. This collapses an entire class of “UI and agent disagree about state” bugs into “the agent is the source of truth, period.”

  • One indicator, three skills, four states. AgentProgressIndicatorViewModel is one ~230 LoC observable that handles: explicit Show (button path), passive Show (chat path with sequence inference), skill switch mid-stream, indeterminate fallback, user-confirm-then-Hide. Everything the user sees during a 30-second agent run flows through that one VM.

  • ContextEntry + trigger prompt as belt-and-suspenders. The dual-track parameter contract cost an extra dozen lines per submit method and bought us robustness against LLM tuning drift, which has actually saved us more than once when we swapped models.

  • Refusing to migrate schemaVersion. Telling users “delete this directory and re-init” feels rude until you realize the alternative is migration code that nobody tests because schema bumps are rare. Forcing the cliff keeps the agent’s invariants strong.

Wrap

The agent layer earns its keep by separating generic walking from framework knowledge through two adapter files the workspace owns. The desktop layer earns its keep by treating the agent as its sole backend — passing parameters on two tracks, draining lifecycle events through a hijacked tool channel, and re-rendering off filesystem state the agent owns. Both halves are small enough to read end-to-end in an afternoon, and the contract between them is two JSON schemas plus a 26-entry enum.

If I had to give it a slogan: a strict generic core, a permissive set of hooks, and a UI that trusts the workspace as its own database. That’s the whole shape.

TOC