Compare models in the Workbench
The Workbench is a prompt playground in the dashboard. Run one prompt across several models at once and compare their responses side by side — including latency, tokens, and estimated cost — without writing any code. Every run goes through the gateway like a normal request, so it also shows up in your Analytics.
Run a prompt across models
- In the dashboard, open Workbench.
- Write your prompt. Add a system message and one or more user turns. Use
{{variable}}placeholders and fill their values in the side panel to reuse a template with different inputs. - Select Add model to add columns. Compare several models at once, and set each column’s temperature, max tokens, top P, and seed.
- Select Run (or press ⌘/Ctrl+Enter). Responses stream into each column, and every column reports latency, time-to-first-token, token counts, and estimated cost.
Compare the results
- Toggle Rendered / Raw to switch between formatted and plain output.
- Pin a column as the baseline and turn on Diff vs baseline to highlight differences.
- Turn on Sync scroll to scroll all outputs together.
- Copy all the outputs as Markdown, CSV, or JSON.
Run a dataset
Switch to Dataset mode to test many inputs at once:
- Import a dataset (CSV) and map its columns to your
{{variables}}. - Run the full matrix — every row against every model.
- Review per-model aggregates (latency, cost, tokens) and an upper-bound cost estimate before you run.
Score outputs with an LLM judge
Use the Judge panel to rate outputs against a rubric:
- Absolute scoring grades each output on its own.
- Pairwise scoring compares each model against a baseline column.
Pick a judge model and define the rubric dimensions; the judge scores the completed outputs and shows aggregate scores per model.
Reuse and track runs
- History saves every run. Reopen a past run to inspect it, or re-run it against the current models to track regressions over time.
- Save frequently used prompts to the prompt library and load them later.
Keyboard shortcuts
| Keys | Action |
|---|---|
| ⌘/Ctrl + Enter | Run |
| Esc | Cancel the current run |
| ⌘/Ctrl + 1–9 | Jump to the Nth model column |
Next steps
- Read back the runs you made here: Read your logs and costs
- See every model you can add: Providers and models
Last updated on