Skip to main content

Cloudflare DO Facets in practice: cold-wake and boundary cost

· 6 min read
Larry Maccherone
Founder Lumenize and Transformation.dev

Cloudflare's Durable Object Facets shipped on April 13, 20261. Cloudflare's framing is that they're "essentially free." That's accurate at the infrastructure layer — same V8 isolate as the parent DO, no extra billing line, no separate Worker. However, from the cold-wake and per-call latency perspectives, it's not zero, and I needed to know by how much.

While building Nebula, I wanted to host a per-tenant typia parse-validator close to each tenant's write DO. Facets were the obvious choice, but I wanted real numbers before committing. This post is what I measured: cold-wake contribution and warm RPC boundary cost. For the non-facets-related benchmarking results — throughput, gate semantics, and what I had to unlearn about Durable Objects under load — see the companion post: What I got wrong about Durable Object throughput.

TL;DR

Numbers below come from a ~119 KB facet bundle hosted on a Durable Object that owns the writes. The fixture is a typia parse-validator, but the boundary costs generalize to any facet workload of similar bundle size:

  • Cold-wake adds ~262 ms above whatever your DO already pays at first wake — bundle load + module parse + first call into the bundle's exports. One-time per bundle.
  • Warm boundary cost: ~1.35 ms per facet call. Our 1.4 ms total is ~1.35 ms generic boundary plus ~50 µs of inner work (a typia parse, in this fixture). The boundary number generalizes; the inner work depends on what you put in the bundle.

What "facet" means here

A Durable Object Facet is a way to run a Dynamic Worker inside a parent Durable Object's V8 isolate — same process, same thread, no network hop. Each facet gets its own 128 MB memory budget, but they share the runtime infrastructure, which is what makes the call cheap. A call into a facet is just a local Workers RPC hop, not a network round-trip.

The hop is fast but measurable. For most workloads, other costs — cold-wake when the bundle has to load and parse, the actual work the bundle does — will dominate your performance modeling. The ~1.35 ms boundary cost only matters on hot paths or in latency-sensitive use cases.

The fixture

To measure the facet boundary, we needed a realistically sized workload running in the facet. Ours is a ~119 KB module that happens to be some real work we were doing for Nebula.2

If your facet hosts something else of similar bundle size — a rules engine, a sandboxed transformer, an LLM agent's generated code à la Cloudflare Code Mode — the boundary numbers (cold-wake, warm RPC) should land in the same neighborhood. What changes is the inner work cost (~50 µs per call in our workload).

The integrated transaction breakdown — what the bench actually measures end-to-end through Gateway DO, mesh routing, and storage commit — lives in the companion throughput post.

Cold-wake (one-time per bundle)

~262 ms above the DO infrastructure baseline (~1,494 ms — the cold-wake everyone with DOs pays). Bundle-size dominated: V8 has to fetch, parse, and instantiate the module on first wake, and parse cost scales roughly linearly with source size. Smaller bundles cost less; larger ones scale up proportionally. Amortizes to nothing on a warm DO.

Where does the 1.35 ms boundary cost go?

The work the bundle actually does is 50 microseconds. The remaining ~1.35 ms is the boundary cost — generic to any same-isolate facet workload.

A curious observation: facet RPC is roughly an order of magnitude cheaper than a network hop to a separate Worker (1.35 ms vs 5–20 ms typical Service Binding), but five orders of magnitude more expensive than a direct function call (1.35 ms vs ~10 ns). The first gap explains why facets exist. The second is the interesting one — same isolate, same thread, ~100,000× the cost of an in-process call. Where does it go? Some structured-clone work, some promise resolution, some scheduler bookkeeping, what else? My guess is the bulk is Workers RPC treating same-isolate facet calls with the same capability machinery it uses for cross-isolate ones — Cap'n Proto-style marshalling and call destination resolution.

Another unknown: at what point in that 1.35 ms does the input gate open? If at the beginning, single-request latency takes the hit but throughput doesn't. If at the end, both. It's too small of a bucket to spend much effort wondering how it splits, but this is a deep dive after all.

Reproducing this

The bare-facet bench (per-call cost in isolation) lives at experiments/ts-runtime-parser-validator-spike/. The 30-type ontology fixture is at packages/ts-runtime-parser-validator/test/fixtures/benchmark-ontology-30.ts.

The integrated benches (latency + throughput) and full numbers are linked from the companion throughput post.

If you find numbers significantly different from these for your own facet workload — especially if you're seeing higher facet RPC overhead — I'd be very interested. Reach out.


Footnotes

  1. Facets are still in beta on the Workers Paid plan as of this writing. No GA timing announcement, no breaking-change entries in the Durable Objects changelog since launch. The adjacent Dynamic Worker API is receiving additive enhancements (custom limits, nullable bundle names) — evolving but compatible. We're using a stable beta of an evolving feature, not betting on shifting sand.

  2. Our workload: a typia-generated parse-validator hosted as a facet on each Nebula Star Durable Object (Nebula's per-tenant write DO). The validator is generated from a 30-type ontology — interfaces with primitives, optionals, unions, nested relationships (T, T | null, T[], Set<T>, Map<K, T>), and the standard JSDoc tags (@minimum, @format email, @default, etc.). Source: benchmark-ontology-30.ts.