Skip to main content

Running untrusted code on Cloudflare: DWL, codemode, Containers, and Sandbox compared

· 18 min read
Larry Maccherone
Founder Lumenize and Transformation.dev

Cloudflare now offers four distinct ways to run code you don't fully trust — user-submitted functions, LLM-generated scripts, plugin systems, or third-party integrations. They fall into two tiers that differ by orders of magnitude in startup time, capability, and cost:

  • V8 Isolate tier: Dynamic Worker Loader (DWL) and codemode — millisecond cold starts, JavaScript/Python only, 128 MB memory
  • Linux VM tier: Containers and Sandbox SDK — 2-3 second cold starts, any language or binary, up to 12 GB memory

We needed to understand all four for Lumenize Nebula, where vibe-coder-provided guards and validators run in DWL isolates, and heavier workloads like TypeScript type-checking may run in Containers. This post is what we learned.

The two tiers at a glance

CriterionDWLcodemodeContainerSandbox SDK
Cold start~1 ms~1 ms2-3 s2-3 s
RuntimeV8 (JS/Python/WASM)V8 (JS only)Any binaryUbuntu 22.04 (Python, Node, Git)
Network isolationglobalOutbound: nullDefault nullenableInternet flagConfigurable
Max memory128 MB128 MBUp to 12 GBInstance-dependent
Billing modelWorker requestWorker requestCPU + memory + disk per 10 ms + egressSame as Container
GA statusClosed beta (prod), open for local devExperimentalOpen betaBeta
DX levelLow-level APIHigh-level (AI SDK tool)Medium (Dockerfile + DO sidecar)High-level SDK
StateAssume statelessStatelessEphemeral diskEphemeral + R2 mounts
Best fitHot-path code dispatch, guards, validatorsLLM tool orchestrationCustom binaries, heavy computeAI agent code execution

The rest of this post unpacks each cell.


Tier 1: V8 Isolates — fast, cheap, constrained

Both DWL and codemode run inside V8 isolates — the same technology that powers every Cloudflare Worker. The difference is abstraction level.

Dynamic Worker Loader (DWL)

DWL is the low-level primitive. You get a LOADER binding, call LOADER.get(id, callback), and back comes a WorkerStub that can handle fetch() requests and Workers RPC calls. The callback returns a WorkerCode object: your code as strings, organized into modules.

// Host Worker or Durable Object
const worker = env.LOADER.get('plugin-v3', () => ({
compatibilityDate: '2025-09-12',
mainModule: 'main.js',
modules: {
'main.js': pluginCode, // JS string
'utils.js': utilsCode, // additional modules
},
env: { API_KEY: 'safe-to-share' }, // only structured-clonable values
globalOutbound: null, // block all fetch/connect
}));

// Call a named entrypoint's RPC method — no fetch() needed
const entrypoint = worker.getEntrypoint('PluginConfig');
const config = await entrypoint.getResourceConfig();
const result = await entrypoint.validate(data);

What makes DWL interesting:

  • Isolate caching. The runtime may keep an isolate warm across calls with the same id. No guarantee, but in practice you get near-zero overhead for repeated calls. Change the code? Change the id.
  • RPC, not just fetch. DWL Workers can export WorkerEntrypoint classes with regular methods. The host calls them directly over Workers RPC — no HTTP serialization, no routing boilerplate.
  • Explicit env control. The child Worker gets only what you pass in env. No inherited bindings, no ambient access. Combined with globalOutbound: null, the sandbox can't reach the network at all.
  • Module dictionaries. You can bundle dependencies (like a 143 KB mesh framework) and pass them alongside user code. The child imports them with standard ES module syntax.

What to watch out for:

  • WASM modules are supported. Since November 2025 (workerd #5460), you can include { wasm: wasmData } modules alongside { js }, { cjs }, { text }, { data }, { json }, and { py }.
  • No runtime validation of WorkerCode. Unknown properties are silently ignored (workerd #5681). TypeScript types are your only guard against typos.
  • Can't pass DurableObjectNamespace bindings directly. You get DataCloneError. However, DurableObjectClass references and ServiceStub (Fetcher) bindings can be passed via env (workerd PR #4834). In practice, the constraint pushes you toward the "inverted model" where the host DO owns storage and the DWL isolate makes decisions — which is the right pattern anyway.
  • Python is much slower to start. The docs warn that Python Workers startup may defeat the performance benefits of dynamic isolate loading, and pricing may differ at GA.
  • Closed beta for production deployment. Local dev with wrangler works today. Production requires beta signup.
  • Assume stateless. Isolate caching is an optimization, not a contract. Any mutable state should live in the host's storage.

codemode (@cloudflare/codemode)

Codemode wraps DWL into a purpose-built tool for LLM code execution. Instead of managing module dictionaries and entrypoints yourself, you define tools (as Zod schemas or JSON Schema), and codemode generates a TypeScript type declaration that goes into the LLM prompt. The LLM writes JavaScript code that calls codemode.toolName(args), and codemode executes it in an isolated DWL Worker.

import { createCodeTool } from '@cloudflare/codemode/ai';
import { DynamicWorkerExecutor } from '@cloudflare/codemode';
import { tool } from 'ai';
import { z } from 'zod';

// 1. Define your tools
const tools = {
getWeather: tool({
description: 'Get current weather for a city',
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => ({ temp: 72, city }),
}),
};

// 2. Create executor and codemode tool
const executor = new DynamicWorkerExecutor({
loader: env.LOADER,
globalOutbound: null, // default — fully isolated
timeout: 30_000, // 30s execution limit
});

const codeTool = createCodeTool({ tools, executor });
// codeTool is a standard AI SDK Tool — pass it to streamText/generateText

The pipeline:

  1. generateTypes(tools) introspects Zod schemas (via zod-to-ts) or JSON Schema wrappers and produces TypeScript declarations for the LLM prompt
  2. The LLM writes an async function calling await codemode.getWeather({ city: 'Seattle' })
  3. acorn parses and normalizes the LLM's code (strips markdown fences, wraps in async function if needed)
  4. DynamicWorkerExecutor creates a DWL Worker with the code wrapped in a WorkerEntrypoint
  5. Inside the sandbox, a Proxy intercepts codemode.* calls and routes them back to the host via ToolDispatcher extends RpcTarget
  6. Console output is captured and returned in ExecuteResult.logs

When codemode helps vs. when it doesn't:

Use codemode when...Use raw DWL when...
LLM generates the codeCode is pre-authored or uploaded by users
You want AI SDK integrationYou need custom entrypoints or module loading
Tool orchestration is the goalYou're embedding a framework (like Mesh)
You want console capture + timeout out of the boxYou need fine-grained control over isolate lifecycle

Codemode gotchas:

  • Import broken in Workers (v0.1.2). The zod-to-ts dependency pulls in the TypeScript compiler, which uses __filename — a CJS global unavailable in Workers ESM. You'll get __filename is not defined at import time. This affects the entire package, not just generateTypes(), because the module graph isn't tree-shaken at the entry point level. Likely to be fixed in a future release.
  • The zod-to-ts dependency bundles the TypeScript compiler (ts.factory), which significantly increases bundle size even if you only use DynamicWorkerExecutor.
  • Only JavaScript execution — the LLM is explicitly instructed to write JS, not TypeScript.
  • needsApproval on tools is not supported yet — tools execute immediately in the sandbox.
  • The Executor interface is runtime-agnostic (great for portability), but DynamicWorkerExecutor is the only shipped implementation.

Tier 2: Linux VMs — powerful, slower, pricier

When V8's constraints don't fit — you need a native binary, filesystem access, more than 128 MB of memory, or a language beyond JS/Python — Cloudflare Containers give you full Linux VMs on Cloudflare's network. Each Container is paired with a Durable Object that acts as a programmable sidecar.

Containers

Containers run your Docker images on Cloudflare's infrastructure. You push to Cloudflare's registry, configure an instance type, and the platform handles placement, sleep/wake, and the DO sidecar.

// wrangler.jsonc
{
"containers": [{
"class_name": "MyContainer",
"image": "./Dockerfile",
"instance_type": "basic", // 1/4 vCPU, 1 GiB memory, 4 GB disk
"max_instances": 3
}]
}
// src/index.ts
import { Container } from '@cloudflare/containers';

export class MyContainer extends Container {
defaultPort = 8080;

override sleepAfter = '10m'; // auto-sleep after 10 min idle
override envVars = {
NODE_ENV: 'production',
};

async onStart() {
console.log('Container started');
}
}

Instance types and pricing (Workers Paid plan, $5/month):

TypevCPUMemoryDiskMemory cost/hrCPU cost/hr (100% util)
lite1/16256 MiB2 GB$0.0023$0.0045
basic1/41 GiB4 GB$0.009$0.018
standard-11/24 GiB8 GB$0.036$0.036
standard-216 GiB12 GB$0.054$0.072
standard-328 GiB16 GB$0.072$0.144
standard-4412 GiB20 GB$0.108$0.288

CPU is billed on active usage only (not provisioned) since November 2025. Memory and disk are billed on provisioned resources. Plus Workers request costs, DO wall-clock billing, and network egress ($0.025-0.05/GB).

Container facts:

  • Disk is ephemeral. Everything is lost on sleep/restart. Persist with R2 bucket mounts.
  • Cold start is 2-3 seconds, image-dependent. Future Firecracker snapshot support could potentially reduce this.
  • Only Cloudflare Registry. No Docker Hub or ECR pulls.
  • No autoscaling yet. Manual scaling only via getContainer(binding, id) or getRandom(binding, count). Autoscaling and latency-aware routing are planned for GA.
  • Shutdown is graceful. SIGTERM first, then SIGKILL after 15 minutes. Design for graceful shutdown.
  • Billing stacks. You pay Workers + DO + Container compute + egress. Four billing meters for one request path.

Sandbox SDK (@cloudflare/sandbox)

Sandbox SDK wraps Containers the way codemode wraps DWL — adding a high-level API on top of the raw primitive. It ships a pre-built Ubuntu 22.04 image with Python 3, Node.js, and Git pre-installed.

import { getSandbox } from '@cloudflare/sandbox';
export { Sandbox } from '@cloudflare/sandbox';

export default {
async fetch(request: Request, env: Env) {
const sandbox = getSandbox(env.Sandbox, 'my-sandbox');

// Execute a command
const result = await sandbox.exec('python3', ['-c', 'print(2 + 2)']);
// result.exitCode, result.stdout, result.stderr

// Or use the code interpreter for persistent sessions
const ctx = await sandbox.createCodeContext({ language: 'python' });
const output = await ctx.execute('import math; math.pi');

return new Response(JSON.stringify(output));
},
};

What Sandbox adds over raw Containers:

FeatureRaw ContainerSandbox SDK
Shell executionDIY HTTP APIsandbox.exec(cmd, args)
Streaming outputDIYsandbox.execStream()
File operationsDIYsandbox.files.read/write/list
Code interpreterDIYcreateCodeContext() with rich output
Process managementDIYsandbox.process.start/stop/list
Port exposureDIYsandbox.ports.expose() with preview URLs
R2 mountsConfig-onlySDK-managed
Git operationsDIYsandbox.git.clone/pull/push
Pre-installed runtimesBuild your own imagePython 3, Node.js, Git included

What Sandbox does NOT add: Authentication, authorization, input validation, rate limiting. Those are your responsibility.

Transport options: The SDK communicates with the container over HTTP (default) or WebSocket. HTTP is simpler but each SDK call is a subrequest (50 free / 1,000 paid per request). WebSocket multiplexes all calls over a single connection, avoiding subrequest limits for concurrent operations. The transport is transparent to your code.

Sandbox gotchas:

  • State is ephemeral. Files, processes, and env vars are lost when the sandbox sleeps (default: 10 minutes).
  • Within a sandbox, all code shares resources. There's no per-execution isolation. Different users or actors need separate sandbox instances.
  • Same billing as raw Containers — the SDK abstraction is free, but you're still paying for the underlying Container.

The crossover: when to use which tier

The decision between tiers is usually obvious:

You need...Use
Sub-millisecond dispatch for JS/Python codeV8 Isolate tier (DWL or codemode)
Native binaries (Go, Rust, C++)Linux VM tier (Container or Sandbox)
More than 128 MB memoryLinux VM tier
Filesystem accessLinux VM tier
Lowest possible cost per invocationV8 Isolate tier
Pre-built Ubuntu with Python/NodeSandbox SDK
Full control over the Docker imageRaw Container

The decision within each tier is about abstraction level:

QuestionRaw (DWL / Container)Wrapped (codemode / Sandbox)
Who writes the code?You or your usersThe LLM
Do you need custom module loading?Yes → RawNo → Wrapped is simpler
Do you need the AI SDK Tool interface?No → RawYes → codemode
Do you need exec/files/git/interpreter?No → Raw ContainerYes → Sandbox
How much boilerplate are you willing to write?More → Raw gives controlLess → Wrapped handles it

Can they coexist?

Yes. A single Worker can have both a LOADER binding (DWL) and a Container binding. This enables architectures where the hot path (guards, validation, config) runs in DWL with ~1 ms overhead, while cold-path operations (type-checking, code compilation, heavy compute) run in Containers with 2-3 second startup amortized across many calls.


Also worth knowing: Workers for Platforms

Workers for Platforms (WfP) is the production-grade predecessor to DWL for multi-tenant code execution. It's been GA for years and powers services like Shopify Oxygen. Where DWL uses LOADER.get(id, callback) to spawn isolates from code strings, WfP uses "dispatch namespaces" — you upload customer Workers to a namespace via API, then a dispatch Worker routes requests to them.

AspectDWLWorkers for Platforms
Code deliveryIn-memory stringsAPI upload to namespace
GA statusClosed betaGA
PricingWorker requestPer-namespace + per-request
Isolation modelIsolate per get() callIsolate per dispatched Worker
Use caseRuntime code loading, pluginsMulti-tenant SaaS platforms

If you need production-stable multi-tenant isolation today and DWL's beta status is a blocker, WfP is the proven alternative. DWL's advantage is developer ergonomics — code-as-strings is simpler than managing upload/deployment pipelines.


Security model comparison

All four technologies provide strong isolation, but the mechanisms and defaults differ:

Security aspectDWLcodemodeContainerSandbox
Isolation boundaryV8 isolateV8 isolateLinux VMLinux VM
Network defaultOpen (must set globalOutbound: null)Blocked by defaultOpen (must configure)Configurable
Env inheritanceNone — explicit onlyNoneExplicit envVarsExplicit
Filesystem accessNone (V8 has no FS)NoneFull (ephemeral)Full (ephemeral)
Process spawningNot possibleNot possibleFullFull
Resource limits128 MB, CPU time limits128 MB, 30s timeoutInstance-type dependentInstance-type dependent
Auth/authzYour responsibilityYour responsibilityYour responsibilityYour responsibility

The V8 isolate boundary is arguably stronger for code-level isolation because there's no filesystem, no process table, and no kernel to attack. The trade-off is capability — you can only run what V8 supports.


Billing deep dive

V8 Isolate tier — DWL and codemode add no billing beyond the host Worker request. You pay standard Workers pricing: $0.30 per million requests on the paid plan (first 10M/month included). The DWL isolate itself has no separate meter. This makes DWL essentially free for hot-path dispatch.

Linux VM tier — Containers stack four billing meters:

  1. Worker request — same as above
  2. Durable Object — wall-clock billing for the sidecar DO ($12.50/million GB-s)
  3. Container compute — CPU (active usage), memory (provisioned), disk (provisioned)
  4. Network egress — $0.025-0.05/GB

For a basic instance (1/4 vCPU, 1 GiB) running continuously at 20% CPU utilization:

  • Memory: 1 GiB × 3600s × $0.0000025 = $0.009/hr
  • CPU: 0.25 vCPU × 3600s × $0.000020 × 0.20 = $0.0036/hr
  • Disk: 4 GB × 3600s × $0.00000007 = $0.001/hr
  • Total: ~$0.014/hr or ~$10/month

Plus DO wall-clock and Worker request costs.

Break-even: cold-starting vs. staying warm. If each cold start costs ~3 seconds of compute and you're paying for idle time, the break-even point depends on request frequency. At 1 request/minute, the 3-second startup amortizes to 5% overhead. At 1 request/hour, you're paying ~$0.014/hr in idle costs vs. ~$0.00004 in startup compute — staying warm is 350x more expensive per request. Use sleepAfter aggressively for bursty workloads.


What we measured

We built a spike with six DWL experiments:

TestWhatResult
Basic DWL fetchDO spawns isolate, calls fetch()Works
Env passingStructured-clonable values in envWorks
WorkerEntrypoint RPCNamed entrypoint with typed methodsWorks — full RPC, objects round-trip
DO namespace bindingPass DurableObjectNamespace in envDataCloneError — confirms inverted model
Framework integration143 KB mesh bundle in module dictWorks — this.lmz, this.ctn() all functional
Auth-aware guardsMesh envelope with JWT claims propagationWorks — role-based guards in DWL

DWL benchmarks (local dev, wrangler 4.71.0 — production numbers will differ):

BenchmarkMedianp95Notes
Isolate creation (cold)1 ms2 msUnique id per call — forces fresh isolate
Isolate creation (warm)<1 ms<1 msSame id — runtime reuses cached isolate
RPC: string return<1 ms<1 msPre-warmed isolate, ping() → 'pong'
RPC: complex object<1 ms1 msPre-warmed, 100-user nested object
Module loading: 1 KB1 ms2 msCold, main + helper module
Module loading: 100 KB2 ms3 msCold, main + 98 KB helper
Module loading: 500 KB8 ms12 msCold, main + 488 KB helper
globalOutbound: null<1 ms<1 msZero overhead vs. unrestricted
codemode-equivalent1 ms1 msConsole capture + Proxy + timeout + RPC dispatch

Key takeaways: isolate creation is the dominant cost (~1 ms cold), and it scales linearly with module size. RPC marshaling, globalOutbound: null, and codemode's wrapping pattern (console capture, Proxy-based tool dispatch, setTimeout timeout) all add no measurable overhead. The caching benefit is real — once an isolate is warm, calls are sub-millisecond.

One codemode gotcha we hit: @cloudflare/codemode v0.1.2 can't currently be imported in Workers because its zod-to-ts dependency pulls in the TypeScript compiler, which uses __filename (a CJS global unavailable in Workers ESM). We benchmarked codemode's overhead by replicating its DynamicWorkerExecutor pattern directly.

tsgo type-checking benchmarks (local, @typescript/native-preview 7.0.0-dev — representative of Container workload):

Schema sizeFilesDiskMedianNotes
10 types1248 KB82 msStartup-dominated (~80 ms overhead)
100 types102416 KB91 msOnly 9 ms more than 10 types
1000 types10024 MB283 ms~0.2 ms per additional type

Each "type" is a realistic schema: 3 interfaces (resource, metadata, config), a discriminated union event type, and a type guard function. tsgo's startup overhead (~80 ms) dominates small schemas. At scale, the per-type cost is negligible. In a Container with 2-3 second cold start, the type-checking itself is the fast part — it's the Container startup you're amortizing.

Container cold start: We haven't deployed container benchmarks yet. Cloudflare's published spec is 2-3 seconds, image-dependent. A minimal tsgo-only image would add ~45 MB (the @typescript/native-preview binary). Combined with the type-checking numbers above, a full cold-start-plus-check cycle for 100 types would be roughly 2-3 seconds (startup) + 91 ms (check) ≈ 2.1-3.1 seconds total.


Decision framework

Rather than prescribing a "best for" answer, here's how we think about the choice:

Start with the constraints:

  1. Does the code need native binaries or more than 128 MB? → Linux VM tier. Otherwise → V8 Isolate tier.
  2. Can you tolerate 2-3 second cold starts? No → V8 Isolate tier only.
  3. Is the code LLM-generated? → Consider codemode (V8) or Sandbox (Linux). Both are purpose-built for AI agents.
  4. Is cost the primary concern? → V8 Isolate tier. DWL adds zero marginal cost beyond the host Worker request.

Then choose the abstraction level:

  1. Do you need fine-grained control (custom modules, entrypoints, lifecycle)? → Raw (DWL or Container).
  2. Do you want batteries-included with less code? → Wrapped (codemode or Sandbox).
  3. Are you building a multi-tenant SaaS platform that needs GA stability today? → Workers for Platforms.

In Nebula, we use DWL for hot-path operations (resource guards, validators, config) where ~1 ms overhead matters and the code is pre-authored JavaScript. Containers are reserved for operations that need native tooling — like running tsgo for TypeScript type-checking — where 2-3 second cold starts are acceptable because the results are cached.


Current limitations worth tracking

These are the rough edges we've hit or that the docs call out. All are expected to improve as the technologies mature:

  1. DWL production access requires beta signup. Local dev with wrangler works today.
  2. DWL doesn't validate WorkerCode at runtime (workerd #5681). Typos in config silently pass.
  3. codemode can't be imported in Workers (v0.1.2). The zod-to-ts → TypeScript compiler chain uses __filename, breaking ESM Workers. The wrapping patterns work fine if replicated directly.
  4. Container disk is ephemeral. Use R2 mounts for anything you need to survive sleep.
  5. Sandbox shares resources within a single instance. Different actors need different sandbox IDs.
  6. No Container autoscaling. Manual max_instances only.
  7. DWL isolate caching has no SLA. Same id may reuse an isolate. Design for statelessness.
  8. codemode's needsApproval is not wired up. Tools execute immediately without approval flow.
  9. Containers only support Cloudflare Registry. No Docker Hub or ECR.
  10. CPU pricing changed November 2025. Now based on active usage, not provisioned. Old blog posts may show higher numbers.
  11. DWL Python Workers are slow to start. The docs explicitly warn that Python's startup time "may defeat some of the benefits of dynamic isolate loading."
  12. Sandbox subrequest limits. The default HTTP transport counts each SDK call as a Worker subrequest (50 free, 1,000 paid). Use WebSocket transport for high-call-volume workloads.