UMA benchmark and footprint notes

These numbers are intended as honest proof points, not universal claims. They show that the UMA examples can publish measurable artifact sizes and repeated execution timings while keeping the architectural claim grounded: write once, run where it makes sense.

How to read this page

The measurements here come from reproducible local runs against fixed inputs. They are useful because they expose tradeoffs clearly. They are not meant to imply that one runtime always wins in every deployment environment.

The stronger claim UMA makes is not raw speed alone. It is that portable behavior remains measurable, comparable, and explicit across runtime choices.

The startup and memory numbers are especially sensitive to local machine state, filesystem caches, and host runtime behavior. That is why the page shows both repeated timings and one first measured run, rather than pretending there is only one number worth publishing.

Environment

macOS 26.3 on arm64, Rust 1.94.0, Node v23.11.0, local pinned Wasmtime v39.0.0.

Method

Release builds, fixed inputs, warmup runs, then repeated local measurements with mean, median, min, and max timing.

Scope

Chapter 4 for a minimal portable evaluator, and Chapter 6 for native-versus-WASI portability under one contract.

What not to infer

These are not cloud benchmarks, distributed throughput tests, or cost comparisons across production-scale environments.

At a glance

These comparisons are normalized inside each row so the slower or larger path sets the full scale. They are not trying to imply global winners across workloads. They are here to make the local tradeoff legible quickly.

Chapter 4 repeated execution

Rust WASI via Wasmtime33.42 ms mean

TypeScript via Node271.17 ms mean

Chapter 4 first run and memory

Rust WASI cold start51.63 ms

TypeScript cold start96.45 ms

Rust WASI peak RSS17.09 MiB

TypeScript peak RSS37.95 MiB

Chapter 6 artifact footprint

Native runner5.31 MiB

WASI runner352.33 KiB

Chapter 6 repeated execution

Native runner56.73 ms mean

WASI runner via Wasmtime379.63 ms mean

Chapter 6 first run and memory

Native cold start336.64 ms

WASI cold start212.73 ms

Native peak RSS7.80 MiB

WASI peak RSS45.88 MiB

Chapter 13 reference runtime

Release CLI binary947.95 KiB

Render JSON path512.36 ms mean

First measured run964.64 ms

Peak RSS2.19 MiB

Chapter 4: minimal portable evaluator

Chapter 4 is the right first footprint proof because it shows the smallest portable unit in the repo: one contract, one deterministic evaluator, and one WASI boundary.

WASI module size: 198.00 KiB
Input: lab2-rollout-match.json
First measured run: 51.63 ms (Rust/WASI), 96.45 ms (TypeScript/Node)
Peak RSS: 17.09 MiB (Rust/WASI), 37.95 MiB (TypeScript/Node)

Path	Mean (ms)	Median (ms)	Min (ms)	Max (ms)	Runs
Rust WASI via Wasmtime	33.42	31.93	22.7	57.89	20
TypeScript via Node	271.17	125.61	82.44	1662.06	20

On this fixed input, the WASI evaluator stayed dramatically smaller and lighter in memory than the TypeScript path under the same local measurement method. The timing spread is visibly noisy on the Node side, which is exactly why this page now shows repeated timings together with one first measured run. The architectural point is not that Rust or WASI always wins. It is that one portable evaluator can stay compact and measurable without being rewritten into a second implementation just to change the runtime surface.

Chapter 6: portability proof under two runtime targets

Chapter 6 is the more interesting benchmark because it keeps one contract and one shared service behavior while running it through both a native path and a WASI path.

Native runner size: 5.31 MiB
WASI runner size: 352.33 KiB
Input: sample-data/sample.pgm
First measured run: 336.64 ms (native), 212.73 ms (WASI via Wasmtime)
Peak RSS: 7.80 MiB (native), 45.88 MiB (WASI via Wasmtime)

Path	Mean (ms)	Median (ms)	Min (ms)	Max (ms)	Runs
Native runner	56.73	45.79	25.79	157.13	20
WASI runner via Wasmtime	379.63	290.29	152.34	1616.95	20

This is the tradeoff the page should make explicit. The native runner stays lighter in memory and faster on repeated runs on this machine, while the WASI runner stays much smaller as an artifact and preserves the same behavior under a portable execution target. That is exactly the kind of evidence UMA should publish instead of hiding behind vague portability language.

Chapter 13: reference runtime CLI

The earlier chapters prove compact portability. Chapter 13 adds a different kind of evidence: the reference runtime can still expose a deterministic local path without dragging the benchmark through the model-backed AI setup. The measured slice here is the release CLI rendering use-case-1-basic-report as JSON.

CLI binary size: 947.95 KiB
Input: use-case-1-basic-report
First measured run: 964.64 ms
Peak RSS: 2.19 MiB

Path	Mean (ms)	Median (ms)	Min (ms)	Max (ms)	Runs
Render JSON via CLI	512.36	312.45	48.06	2228.3	20

That does not measure the full AI-assisted workflow. It measures the deterministic runtime/report path that should stay fast enough and legible enough to act as a real operational surface, not just a demo shell. It also makes one useful point visible: the Chapter 13 CLI is still a small binary with a modest memory footprint even though the repeated local timing of the full JSON render path is not as tight as the earlier chapter slices.

What these numbers actually prove

The important result is not that every portable path is the fastest. The important result is that the portable path remains compact, measurable, and behaviorally comparable. That gives teams a much cleaner basis for deciding where logic should run instead of duplicating the behavior by default.

In the book, I use the architectural model behind these examples more broadly than this page can. Here, the goal is narrower: give you honest proof data you can inspect instead of asking you to trust portability claims at face value.

If you want to reproduce these numbers locally, the repository includes the reporting script and the generated benchmark artifacts so you can inspect both the method and the result rather than treating this page as a static marketing claim.

Continue from numbers into the model

If the benchmark notes make the tradeoff feel more concrete, the next useful move is to connect them back to portability, runtime authority, and the reference app.

Benchmark report Benchmark script What makes a service portable? How to prove portability What is a UMA runtime? Examples Live reference app