Figma → Code, The Hard Way

Most “Figma to code” tools are vertical: pick a stack (usually React or Flutter), hard-code the mapping, ship a plugin. They work — until your codebase isn’t React or Flutter, or until your team’s component library, naming conventions, and design tokens don’t match the canned ones.
Over the past few months I built a different shape of the same tool: a stack-agnostic agent that treats the target framework as an injection point, paired with a desktop client that uses the agent itself as its backend. This post walks through both halves — the agent skill design and how it adapts to arbitrary stacks (Section 1), and the desktop UI that streams lifecycle events back from the agent and never opens a single REST endpoint of its own (Section 2).

The agent code lives in devhub_devharness/agents/figma2code, the desktop client lives
under project/desktop/.../FigmaToCode. Diagrams below are rendered with Mermaid; the
loader is injected at the bottom of this post.
Section 1 — The agent layer
Why three skills, not one
The agent exposes three independently runnable skills:
| Skill | Idempotent over… | What it touches |
|---|---|---|
generic-design-init |
a workspace (once per project) | writes <workspace>/.hx/figma2code/ — config, adapters, schema docs |
generic-design-parser |
a Figma URL (once per snapshot) | writes <workspace>/.hx/figma2code/output/<framework>/<timestamp>/ |
generic-code-gen |
a snapshot + change scope | edits the workspace in place |
Splitting them isn’t bookkeeping — it’s a dependency contract. Parser and code-gen both
assume the workspace is initialized; they validate that assumption with a validateF2CWorkspace
tool call at boot, and refuse to run if config.json is missing or its schemaVersion drifted
from the code-side SCHEMA_VERSION. There is no graceful-degradation path. Either the workspace is
in a known-good shape, or you go re-run init.
This is unfashionable for an LLM tool — agents love to recover from missing files by guessing — but the alternative is silent corruption of someone’s actual project. The cost of “fail loud” is one extra hop; the cost of “be helpful” is a rewrite that doesn’t compile.
The agent is also VCS-aware from the very first skill. Init asks the user which version-control
system the project uses — strictly one of git / p4v / none, no free-form input — and the
answer rides along through config.json’s vcs field. Two things follow from that:
- Perforce credentials get a separate, gitignored home. When the user picks
p4v, init collectsP4PORT/P4USER/P4CLIENT, writes them to<workspace>/.hx/figma2code/.p4configin standard P4CONFIG format, and idempotently appends that path to<workspace>/.gitignore. The credentials never enterconfig.json, never get committed, and never appear in any prompt the agent sends to the LLM. Only the fact that vcs isp4vis public; the keys themselves stay local. - Code-gen edits in place so the user’s existing review tooling owns the diff. The skill
never opens a branch, never stages a commit, never runs
git add. It writes files; the user sees them asM/??/Ain their normalgit statusor P4 changelist. The desktop’s file tree, in turn, runs a one-shotGitChangeScanner/P4VChangeScannerafter each agent turn and color-codes nodes byChangeKindso the user can visually verify what the last skill touched. The agent and the version-control system stay in their respective lanes — the agent owns what changed, the VCS owns what gets committed.
(once per workspace)"] --> B["generic-design-parser
(once per snapshot)"] B --> C["generic-code-gen
(once per change scope)"] A -.writes.-> W["<workspace>/.hx/figma2code/
config.json + adapters + schema doc"] B -.writes.-> S["output/<framework>/<timestamp>/
semantic-ast.json + screenshots/ + image-map.json"] C -.edits in place.-> P["target project files"] W -.gates.-> B W -.gates.-> C S -.consumed by.-> C style W fill:#fff7d6,stroke:#aa9 style S fill:#e6f4ff,stroke:#79a style P fill:#dff7e0,stroke:#7a8
The “generic flow + workspace adapter” pattern
The whole architecture rests on a single observation: almost every step in a Figma-to-code pipeline is framework-independent. Fetching a node tree, walking it, exporting screenshots, deduping images, normalizing colors to RGBA — none of those care whether you ship Flutter or SwiftUI. What does care is a small set of decisions clustered at the boundaries:
- How does a
{r: 1, g: 0.5, b: 0, a: 1}color render as a string?Color(0xffff8000)for Flutter,rgb(255, 128, 0)for CSS,#FFFF8000for Avalonia. - What does
layoutMode: HORIZONTALmap to?"row"in Flutter,"row"in Avalonia,flex-direction: rowin CSS. - What’s a vector group called when you flatten it?
SvgPicturein Flutter,Image(with svg source) in Compose,<svg>in web.
So the parser ships a strict interface — FrameworkAdapter — and asks each workspace to
provide its own implementation:
export interface FrameworkAdapter {
transformColor(rgba: RGBA): string;
layoutMaps: LayoutMaps; // direction / mainAxisAlign / crossAxisAlign / mainAxisSize
mapNodeType(figmaType: string, ctx: NodeTypeContext): string | undefined;
svgPictureType: string; // must equal mapNodeType('VECTOR', ...)
staticImageType: string; // must equal mapNodeType('IMAGE', ...)
postAstTokenGenerated(astPath: string): Promise<void>; // lifecycle hook
}
The adapter file lives at <workspace>/.hx/figma2code/extensions/generic-design-parser.ts. The
parser dynamically import()s it at runtime and validates the five required fields before doing
anything. Three things make this work in practice:
- The
initskill writes the adapter, not the user. When a workspace is first initialized, the agent reads the project’s actual code (pubspec.yaml,*.cs, whatever), figures out the stack, and generates the adapter from a template + the workspace’s existing conventions. The user gets a working file they can later hand-edit; they don’t start from a blank page. - The AST schema doc is rendered from the adapter. A second template,
astSyntaxAutoGenerated.md, has 17 placeholders (,, ``, …). The init skill calls each adapter method and substitutes real values. Code-gen later reads this file as the authority on whattype: "Row"means in this workspace — never the parser source code, never a hard-coded table. - A second adapter exists for code-gen.
generic-design-gen.tsexposes one hook,postCodeGenFinished(isSuccess, message), called after the skill finishes writing files. The default implementation is empty; teams use it to kick offflutter run, trigger a hot reload, or rundotnet buildfor a smoke check. It’s a deliberate echo ofpostAstTokenGenerated— a single, named lifecycle hook the user can specialize without touching the skill internals.
The trick is pushing all framework knowledge into two ~150-line TypeScript files that the workspace owns, and keeping the skill itself a generic walker.
Figma API and the Design Token bridge
The parser pulls three things from Figma’s REST API:
GET /v1/files/:key— the document tree, withboundVariablesreferences inlined.GET /v1/images/:key?ids=...— PNG renders for vector groups, image fills, and any node we intend to render as an asset.GET /v1/files/:key/variables/local— the Design Variables system, which is Figma’s first-class token store (colors, numbers, strings, booleans, with mode-aware values).
The third one is the interesting one. Figma’s Variables API gives you typed tokens grouped by
mode (light/dark, brand A/brand B, …), and the document references them via boundVariables.
A button fill that says boundVariables.fills[0] = { type: "VARIABLE_ALIAS", id: "VariableID:42:1" }
is telling you “this fill is whatever Color/Primary resolves to” — not a hard-coded #3FD47A.
The parser turns this into a workspace-side design-variables.json with two passes:
1. Resolve aliases: chase VARIABLE_ALIAS chains up to 6 hops, write the leaf value back
into the AST node's color field directly. (No more aliases in semantic-ast.json.)
2. Emit the token table: flatten all variables into { name, type, mode, value } records,
route value through adapter.transformColor for COLOR types so the output is already
in target-stack literals.
Why resolve aliases at parse time instead of at code-gen time? Because the rendering phase needs concrete colors to do the screenshot dedup correctly (two nodes referencing the same alias should produce identical bitmaps), and because the resulting AST is then trivially consumable by any downstream tool — including non-LLM tools — that doesn’t speak Figma’s variable schema.
The code-gen skill consumes design-variables.json directly and is expected to map tokens to the
target stack’s idiomatic representation: a ThemeData extension in Flutter, a static class of
Color constants in Avalonia, a CSS :root { --color-primary: ... } block in web. The mapping is
the agent’s job during code-gen — not the parser’s. The parser only resolves and flattens.
resolve VARIABLE_ALIAS chains"] P2["export-screenshots.ts
scale=2, PNG"] P3["to-semantic-ast.ts
walk tree → adapter.mapNodeType"] P4["extract-material.py
md5 + perceptual hash dedup"] end subgraph Snapshot["<snapshot>/"] O1[design-context.json] O2[screenshots/*.png] O3[semantic-ast.json] O4[design-variables.json] O5[image-map.json] end F1 --> P1 --> O1 F2 --> P2 --> O2 F3 --> P1 P1 --> P3 --> O3 P3 --> O4 P2 --> P4 --> O5 P4 -.rewrites.-> O3 P3 --calls--> A["adapter.transformColor
adapter.mapNodeType
adapter.layoutMaps"] style A fill:#fde2e2,stroke:#a55
Screenshots and the dedup that’s older than your assets folder
Figma documents repeat themselves. The same close-icon is dropped into 30 screens; the same “Preview” thumbnail gets reused across cards; designers paste-and-tweak instead of componentizing. A naive parser would emit 30 distinct PNG assets for the same icon, and the LLM doing code-gen would happily register all 30 with auto-generated names that nobody can search for later.
The dedup pass (extract-material.py) does three things:
- Find candidates. Walk the AST, collect every node whose
typeis in(adapter.svgPictureType, adapter.staticImageType)— i.e., everything that will render as an image asset under this workspace’s stack. - Cluster by similarity. Default is
file_hash(byte-level MD5) which catches verbatim duplicates. Optional is imagededup’sphash(perceptual hash, recommended), which catches “same icon, slightly different export pixels” — a real failure mode when designers re-export sub-pixel-aligned vectors. The threshold (0.95default for phash, equivalent to Hamming distance ≤ 6 on the 64-bit hash) is the empirical sweet spot we landed on after chewing through a few real design files. - Rewrite the AST. Emit
image-map.jsonkeyed by<categoryPrefix><TypeName><NameSuffix>—unionIcon,areaRectangle184, etc. — and rewrite each AST node’sscreenshotfield from a filesystem path to that key. The map keeps{ snapshot: "screenshots/...", md5: "..." }so downstream code-gen can do another cross-workspace dedup against existing assets inlib/assets/orAssets/before writing.
| Method | Speed | Recall | When to use |
|---|---|---|---|
file_hash |
Fastest | Low | When designers always re-export the same source |
phash |
Fast | High | Default. Robust to sub-pixel rendering noise |
cnn |
Slow | Highest | Visually similar but structurally distinct icons |
The image-map keys are deliberately not meant for human consumption — they’re stable identifiers
the AST uses to point at assets. When code-gen actually writes a file into assets/, it picks a
business-meaningful name (user_avatar.png, close_icon.svg) based on the node name, the
screenshot, and the snapshot’s guide.md. The split keeps the parser deterministic (same input →
same keys) while the agent gets to make the human-readable naming decision once, in context.
The details that don’t fit a flowchart
Some of the more useful design choices don’t show up as boxes:
-
schemaVersionis a wall, not a hint. Every config file written under.hx/figma2code/carries aschemaVersion. The code-sideSCHEMA_VERSIONconstant lives inextensions/schema-version.json. On every parser/code-gen entry,validateF2CWorkspacecompares them; mismatch returns aFailureenvelope whose message tells the user verbatim: “delete.hx/figma2code/and re-run generic-design-init”. The agent passes that message through unchanged. It does not offer to migrate. Migration is a class of bug we don’t take on; users are better off rebuilding workspace state than debugging a half-migrated one. -
Cache pruning is a tool call, ordered before status sync. Each parser run leaves a new
<timestamp>/snapshot. Without interventionoutput/<framework>/would balloon. The skill ends with two ordered tool calls:cleanF2CCache(keep last 10 snapshots, repointlatest.txt) beforesyncF2CStatus. The order matters — flipping it would briefly leavestatus.jsonreferencing a snapshot the next call is about to delete. -
The lint pass is non-blocking on purpose. After the AST is built but before status sync, the skill prints an AST lint report: nesting depth, redundant single-child wrappers, repeated topology that should have been a Component, hard-coded colors that bypass Variables. It writes nothing, blocks nothing, and refuses to “fix” anything. Designers iterate; if the report flags something, the human decides whether to clean the Figma file and re-parse, or live with the current AST. Quality nudges, not quality gates.
-
Code-gen has a hard scope-confirm loop. Before any file write,
generic-code-genproduces a markdown checklist (create / modify / delete / asset, with rationale and node references) and asks the userapprove / modify / cancel. Sit in the loop forever if you must. There is no timeout, no “default approve”, no “I think this is what you meant” path. When the LLM tries to wander outside the approved scope mid-write, the rule says: stop, go back to the loop, get a fresh approval. This is the one place where the agent’s normal enthusiasm gets actively suppressed. -
The event stream is its own protocol.
notifyF2CEventis a registered tool whose payload is a{ code, message, payload: { event, data? } }envelope, written into a normaltool_execution_endchannel. There are 26 event literals across the three skills, each one a kebab-case string the desktop client decodes into a typed enum. We didn’t invent a new socket — we stole a tool slot and used it as a status bus. More on that in Section 2.
generic-code-gen, where the AST goes back to being source code
The parser is upstream; code-gen is the downstream consumer that turns its outputs into actual files in the user’s project. The skill’s contract is unusually narrow: read seven things, write only inside an approved scope, and call one lifecycle hook on the way out.
Its inputs split cleanly into two tiers — snapshot-scoped (what this design says) and workspace-scoped (how this project speaks):
| Tier | File | Role at code-gen time |
|---|---|---|
| Snapshot | <snapshot>/guide.md |
The --description the user gave at parse time. Names the task (“product detail v3”), used to disambiguate when the AST has multiple frames and to seed commit-message-grade context. |
| Snapshot | <snapshot>/semantic-ast.json |
Primary structure: nodes, layout, inline style, text. Every type literal here is interpreted via the workspace AST schema doc, not parser source. |
| Snapshot | <snapshot>/design-variables.json |
Resolved Design Tokens. Code-gen maps these to the target stack’s idiom — ThemeData extension in Flutter, :root { --color-* } in CSS, static Color constants in Avalonia. |
| Snapshot | <snapshot>/screenshots/*.png |
Visual cross-check. The agent reads them when AST-only inspection is ambiguous (e.g. “is this gap really 16, or am I miscounting padding?”). |
| Snapshot | <snapshot>/image-map.json |
key → { snapshot, md5 }. AST nodes’ screenshot fields point at these keys; the md5 powers the cross-workspace asset dedup discussed below. |
| Workspace | <workspace>/.hx/figma2code/config.json |
Source of truth for language / framework. Code-gen does not infer the stack from project files — the init skill already decided. |
| Workspace | <workspace>/<config.guidePath> (default guide.md) |
Project-level long-form description (tech stack, accessibility goals, naming preferences). Read on every run as the long-term baseline; the snapshot guide is the short-term overlay. |
| Workspace | <workspace>/<config.astSyntaxPath> (astSyntaxAutoGenerated.md) |
The only authority on what every type / enum / color literal in the AST means in this workspace’s stack. |
| Workspace | <workspace>/<config.codingStandardPath> |
Coding conventions — either the user-provided file or the init-fallback codingStandardAutoGenerated.md. Drives naming, file layout, dependency style. |
| Workspace | <workspace>/.hx/figma2code/extensions/generic-design-parser.ts |
The same FrameworkAdapter the parser used. Code-gen reads it for the source-of-truth color/layout/type mappings rather than re-deriving them. |
| Workspace | <workspace>/.hx/figma2code/extensions/generic-design-gen.ts |
The code-gen adapter. Exactly one hook: postCodeGenFinished(isSuccess, message). Empty by default. |
The execution itself is twelve named steps in SKILL.md, but the ones that actually shape
behavior are these:
- Validate, then load both guides.
validateF2CWorkspacefirst — sameschemaVersionwall as the parser. Then read the workspace-levelguide.md(long-term baseline) and the snapshot-levelguide.md(this task’s intent). When the agent has to decide “which frame in this AST is the user actually asking me to render?”, these two together resolve it. - Resolve types through the schema doc, not the parser. Every time the agent sees
"type": "Row"in the AST, it looks upRowinastSyntaxAutoGenerated.mdto find the target-stack widget name, the layout enum literals, and the color-literal example. This is why the schema doc is rendered at init time from the adapter — it’s the contract code-gen will read at every invocation, never the parser source. - Plan first, write later, on the user’s say-so. Before any file write, the agent
produces a markdown checklist (
create/modify/delete/assetper row, with the workspace-relative path, the rationale, and the source AST node IDs) and presents it viaAskUserQuestionwith three buttons:approve,modify,cancel. The skill loops onmodifyindefinitely — there’s no timeout, no “I think this is what you meant” silent approval, no implicit upgrade ofcanceltoapproveafter N rounds. If, mid-write, the agent realizes it needs to touch a file outside the approved scope, it must stop and go back to step 3 with a fresh checklist. The hard gate exists because the LLM’s natural instinct — “fix while you’re in the area” — is exactly the failure mode that turns code-gen runs into unreviewable churn. - Write in place, into paths the user named. No staging directory, no shadow copy, no pull-request branch. The file the agent edits is the file the user opens in their editor; the diff the user reviews is the diff their VCS shows them. “Place” matters — the agent is forbidden from touching anything outside the approved scope (item 3 enforces this).
- Asset dedup against the workspace, not just the snapshot. The parser’s
image-map.jsonalready deduped within a snapshot. Code-gen does a second pass: before writing a new asset intoassets/or wherever, it MD5s every existing image file in the workspace and compares against themd5field of eachimage-mapentry. A match means “reuse the existing path, don’t write”. Only fresh hashes get a new file, and only at code-gen time does the agent pick a human-readable filename (user_avatar.png,close_icon.svg) using the screenshot, the AST node name, and the snapshot guide — the parser’sunionIcon/areaRectangle184keys are stable identifiers, never user-facing. - Sync, then run the post-hook, then announce done. The last three calls are
syncF2CStatus(writesstatus.jsonso the desktop can re-render off filesystem state) →await adapter.postCodeGenFinished(isSuccess, message)(the workspace’s chance to kick offflutter run, trigger a hot reload, run a smoke build) →notifyF2CEvent code-gen-done. The hook can throw — the agent catches and continues todoneregardless. Failures in the post-hook are workspace concerns, not skill concerns.
load both guides"] G2["interpret AST
via schema doc"] G3["plan checklist
approve / modify / cancel loop"] G4["write in place
asset md5 dedup vs workspace"] G5["syncF2CStatus
+ postCodeGenFinished hook"] end P[(target project
files in place)] S1 --> G1 W1 --> G1 W2 --> G1 S2 --> G2 W3 --> G2 W5 --> G2 S3 --> G2 G2 --> G3 S4 --> G3 W4 --> G3 G3 --approve--> G4 S5 --> G4 G4 --> P G4 --> G5 W6 --> G5 style G3 fill:#fde2e2,stroke:#a55 style P fill:#dff7e0,stroke:#7a8
Notice what’s not in the loop: the agent never re-fetches Figma, never re-runs the parser, never tries to “refresh” anything. The snapshot is the immutable source-of-truth for this generation; if the design changed, the user runs the parser again, which produces a new snapshot, which becomes the input to a new code-gen invocation. That immutability is what makes the skill safely re-runnable on the same snapshot — every call sees the exact same inputs and produces the exact same checklist (modulo LLM nondeterminism), so the user can iterate on the scope (“actually, also touch the theme file”) without worrying that the underlying design data is shifting under them.
Section 2 — The desktop layer: agent-as-backend
Most desktop apps that “talk to an LLM” treat the model as one capability among many, behind a
service layer that also hits a database, a file system, a remote API. The FigmaToCode workload
is more extreme: the agent is the entire backend. There is no FigmaService, no
SnapshotService, no REST proxy. Every meaningful state change in <workspace>/.hx/figma2code/
happens because the agent ran a skill. The desktop’s job is to (a) gather inputs, (b) hand them
to the agent, (c) watch the workspace, (d) re-render.
This section walks through that loop, the contracts that hold it together, and a few specific design merits that earn their keep.
The 30-second tour of the data flow
A few things are doing a lot of work in that diagram.
The two-track parameter contract
When the user clicks “Import snapshot”, the view-model has both a structured request object
(SnapshotImportRequest { WorkspacePath, FigmaUrl, Alias, Description }) and a free-text
triggerPrompt for the agent. The desktop sends both, on two tracks:
| Track | Payload | Consumer |
|---|---|---|
ContextEntrys |
figma2code.parser.workspace_path = "/Users/.../proj", etc. |
The agent’s “read context” path — keys are documented in SKILL.md. |
triggerPrompt |
A natural-language message listing the same params as bullet points. | The agent’s IAgentController.AsyncInputMessage text channel. |
Why both? Because the LLM might preferentially read either, depending on its tuning. The
SKILL.md spells out a “context-first, prompt-second” rule: if every required key is present and
non-empty in ContextEntry, the agent must not ask the user for it again via
AskUserQuestion. The trigger prompt is a fallback narration in case the agent ignores the
structured channel; it’s also a hint to the agent about which skill to invoke. The two tracks
are redundant on purpose — defensive design against an LLM that happens to glance at one channel
and not the other.
// Service/AgentRequestService.cs (excerpt)
List<ContextEntry> entries = [
Attr("figma2code.parser.workspace_path", request.WorkspacePath),
Attr("figma2code.parser.figma_url", request.FigmaUrl),
Attr("figma2code.parser.alias", alias),
Attr("figma2code.parser.description", description),
];
_contextBuffer.AppendRange(entries);
string prompt =
L.Get("F2C_Agent_Prompt_ParserHeader") + "\n" +
L.Format("F2C_Agent_Prompt_LabelWorkspace", request.WorkspacePath) + "\n" +
L.Format("F2C_Agent_Prompt_LabelFigmaUrl", request.FigmaUrl) + "\n" +
L.Format("F2C_Agent_Prompt_LabelAlias", alias) + "\n" +
L.Format("F2C_Agent_Prompt_LabelDescription", description);
await controller.AsyncInputMessage(BuildRequestMessageJson(prompt));
The header sentence ("Trigger skill:generic-design-parser; arguments below are complete...") is
localized so the agent answers in the user’s UI language. The keys themselves (workspace,
snapshot, alias) stay ASCII across all locales — the agent has been tuned to recognize them
as parameter names, not vocabulary. UI culture flips the prose, not the keys.
The buffer + flush contract
AgentContextBuffer is twenty lines of code and pulls a surprising amount of weight:
internal sealed class AgentContextBuffer {
private readonly object _gate = new();
private readonly List<ContextEntry> _entries = new();
public void AppendRange(IEnumerable<ContextEntry>? entries) { /* lock + add */ }
public List<ContextEntry>? Flush() {
lock (_gate) {
if (_entries.Count == 0) return null;
var snapshot = new List<ContextEntry>(_entries);
_entries.Clear();
return snapshot;
}
}
}
The shape of this object encodes a real invariant: context is delivered exactly once, on the
host’s terms. The view-model Appends. The agent host calls IContextProvider.AsyncQueryContext
right before sending a request. That call is what flushes the buffer; nothing else can read it.
There is no cache, no history, no replay. This means:
- Multiple
Submit*Asynccalls before a flush get coalesced. If a user fires a parser request and then a code-gen request before the agent has actually started, both context sets land in the same agent turn. (In practice the UI prevents this, but the buffer doesn’t have to know.) - If the user cancels mid-flight (
RequestStopAgentAsync), the buffer is flushed and discarded manually. The next agent turn starts clean. - The locking is intentionally a single-mutex over a
List<>. We measured: payloads are tens of entries, calls are rare. AConcurrentQueuewould have madeFlushnon-atomic, which is the one operation that has to be all-or-nothing.
The delta channel: streaming progress without inventing a protocol
The agent emits 26 distinct lifecycle events. The desktop wants to render them as “stage 4 of
9, currently parsing AST…” without polling. The trick is that pi (the agent host) already has
a tool_execution_end channel that streams every tool call result back to the client. So we
make event-reporting itself a tool:
// agents/figma2code/extensions/tools/notify_f2c_event.ts (sketch)
registerTool('notifyF2CEvent', async ({ event, code, message, data }) => {
validateAgainstSchema(event, code, data);
return JSON.stringify({
code,
message,
payload: code === 0 ? { event, data } : {}
});
});
The agent calls notifyF2CEvent whenever it enters a new phase. The tool’s result is the
envelope. The desktop catches it on the way out:
// AgentDeltaParser.cs — runs on every stream delta
public IReadOnlyList<F2CAgentEnvelope> Parse(string? delta) {
// 1. Parse delta as JSON, look for tool_call array.
// 2. For each entry, track id → name across start/update/end (hx client doesn't
// re-emit toolName on later phases).
// 3. When type == "end" && toolName == "notifyF2CEvent", deserialize the output
// string into F2CAgentEnvelope and yield it.
}
Then FigmaToCodeViewModel.OnAgentDeltaAsync walks the envelopes and Posts them to the UI
thread, where AgentProgressIndicatorViewModel.UpdateEvent(F2CAgentEvent) recomputes the step
list. Each skill has a fixed sequence (F2CAgentEventSequences.DesignParser,
...CodeGen, ...Init); the indicator just IndexOfs the current event and marks everything
before it Done, the current one InProgress, the rest Pending.
There’s one fun detail: when the user types into the chat directly instead of clicking a
button, the desktop doesn’t know which skill will run. So IContextProvider.AsyncQueryContext
checks: if the progress indicator isn’t visible yet, open it with a generic AgentRunning
sequence (one step, “Running…”). The first real lifecycle event that arrives — say,
design-parser-fetching-design — triggers UpdateEvent, which detects the sequence change
(InferSequenceFromEvent) and swaps the entire step list to the right skill’s sequence.
The transition is lossless and seamless.
The view-model boundary: three interfaces, no implementations leaked
The UI assembly references the view-model assembly for one reason only: to know what shape the data context has. It does this through three small interfaces:
| Interface | Role |
|---|---|
IFigmaToCodeViewModel |
Observable surface: InitializationStatus, ProjectBrowser, SnapshotSection, the Submit*Async methods. |
IFileTreeViewModel |
Defined in Nut.Base.Gui, implemented by ProjectBrowserViewModel. Drives the file tree. |
ICodeEditorViewModel |
Same — drives the read-only code viewer. Same VM implements both, so click-to-open is automatic. |
IAgentDeltaSink |
Lets the host plug the delta stream into us via OnAgentDeltaAsync(string). |
The XAML side never imports a class from the view-model assembly. Bindings are typed via
x:DataType="abs:IFigmaToCodeViewModel", the file tree is a nb:FileTreePanel whose
DataContext is just object? from the interface’s perspective, and the runtime resolves the
actual implementation. This means we can refactor the view-model freely (split classes, change
namespaces, swap CommunityToolkit.Mvvm for something else) without recompiling a single XAML
file.
The [ObservableProperty] source generator does the boilerplate; one-way bindings cover most
of the surface, and the few Mode=TwoWay bindings (workspace path input, P4 fields, snapshot
selection) flow back through the same property setters that drive Submit*Async.
File tree, snapshots, and the watch-the-workspace pattern
When the agent finishes, it leaves files behind. The desktop notices via two paths:
IContextProvider.AsyncUpdateContextis called by the host at the end of every agent turn. The view-model uses this as the canonical “rescan now” signal: re-loadWorkDataModel(which checks forconfig.json/ status / framework dir), refreshSnapshotSectionViewModel.Refresh(), refresh the project browser’s VCS scan, hide the progress indicator. One synchronous-looking method, all the workspace state-sync work funneled through it.- The
IFileTreeViewModelchange scanner (GitChangeScanner/P4VChangeScanner) is deliberately not an inotify watcher. It’s a one-shot scan triggered by the sameAsyncUpdateContext, plus user-driven refresh. Watchers are noisy (compile output, gitignored IDE files, cache flushes), and we’d rather rescan once after a known event than chase thousands of false positives. The scanner producesChangedFilerecords withChangeKind ∈ { Added, Modified, Deleted }, which color-code the file tree nodes (green / yellow / red dots) so the user can visually verify what the last skill touched.
The snapshot list works the same way: SnapshotLoader does pure disk IO (read
design-context.json, count screenshots, count image-map.json keys) on a background thread,
posts results back to the UI thread, and ListBox redraws. No background polling, no file
watcher — re-render happens when the user did something or when the agent finished something.
Localization: prose flips, keys don’t
A small but earnest detail. The same AgentRequestService that builds trigger prompts for the
agent uses the workload’s resx files (Strings.{en,zh-CN,ja,ko}.resx). Two policies coexist:
- User-facing sentences (header lines, fallback descriptions, the “please explain…” template) are translated. The agent ends up answering in the same language the UI is in, because its trigger prompt arrived in that language.
- Machine-readable keys (
workspace,snapshot,alias,vcs,p4 port,figma url,description,change scope,change kind,change range) stay ASCII across every locale. The agent has been tuned to recognize them as parameter names; flipping them to translated Chinese / Japanese / Korean would break key recognition without buying anything for the user.
So the resx for F2C_Agent_Prompt_LabelWorkspace is "- workspace: {0}" in all four files. The
header above it (F2C_Agent_Prompt_ParserHeader) is fully translated. This is the only sane
split: localize what the user reads, freeze what the LLM parses.
What pays for itself
A few design merits on the desktop side that I’d port to the next agent-backed app without hesitation:
-
The agent is the only writer. The desktop reads
<workspace>/.hx/figma2code/and watches it; it never writes. This collapses an entire class of “UI and agent disagree about state” bugs into “the agent is the source of truth, period.” -
One indicator, three skills, four states.
AgentProgressIndicatorViewModelis one ~230 LoC observable that handles: explicit Show (button path), passive Show (chat path with sequence inference), skill switch mid-stream, indeterminate fallback, user-confirm-then-Hide. Everything the user sees during a 30-second agent run flows through that one VM. -
ContextEntry+ trigger prompt as belt-and-suspenders. The dual-track parameter contract cost an extra dozen lines per submit method and bought us robustness against LLM tuning drift, which has actually saved us more than once when we swapped models. -
Refusing to migrate
schemaVersion. Telling users “delete this directory and re-init” feels rude until you realize the alternative is migration code that nobody tests because schema bumps are rare. Forcing the cliff keeps the agent’s invariants strong.
Wrap
The agent layer earns its keep by separating generic walking from framework knowledge through two adapter files the workspace owns. The desktop layer earns its keep by treating the agent as its sole backend — passing parameters on two tracks, draining lifecycle events through a hijacked tool channel, and re-rendering off filesystem state the agent owns. Both halves are small enough to read end-to-end in an afternoon, and the contract between them is two JSON schemas plus a 26-entry enum.
If I had to give it a slogan: a strict generic core, a permissive set of hooks, and a UI that trusts the workspace as its own database. That’s the whole shape.