Compare models in the Workbench

The Workbench is a prompt playground in the dashboard. Run one prompt across several models at once and compare their responses side by side — including latency, tokens, and estimated cost — without writing any code. Every run goes through the gateway like a normal request, so it also shows up in your Analytics.

Run a prompt across models

In the dashboard, open Workbench.
Write your prompt. Add a system message and one or more user turns. Use {{variable}} placeholders and fill their values in the side panel to reuse a template with different inputs.
Select Add model to add columns. Compare several models at once, and set each column’s temperature, max tokens, top P, and seed.
Select Run (or press ⌘/Ctrl+Enter). Responses stream into each column, and every column reports latency, time-to-first-token, token counts, and estimated cost.

Compare the results

Toggle Rendered / Raw to switch between formatted and plain output.
Pin a column as the baseline and turn on Diff vs baseline to highlight differences.
Turn on Sync scroll to scroll all outputs together.
Copy all the outputs as Markdown, CSV, or JSON.

Run a dataset

Switch to Dataset mode to test many inputs at once:

Import a dataset (CSV) and map its columns to your {{variables}}.
Run the full matrix — every row against every model.
Review per-model aggregates (latency, cost, tokens) and an upper-bound cost estimate before you run.

Score outputs with an LLM judge

Use the Judge panel to rate outputs against a rubric:

Absolute scoring grades each output on its own.
Pairwise scoring compares each model against a baseline column.

Pick a judge model and define the rubric dimensions; the judge scores the completed outputs and shows aggregate scores per model.

Reuse and track runs

History saves every run. Reopen a past run to inspect it, or re-run it against the current models to track regressions over time.
Save frequently used prompts to the prompt library and load them later.

Keyboard shortcuts

Keys	Action
⌘/Ctrl + Enter	Run
Esc	Cancel the current run
⌘/Ctrl + 1–9	Jump to the Nth model column

Next steps

Read back the runs you made here: Read your logs and costs
See every model you can add: Providers and models