Skip to Content
How-to guidesCompare models in the Workbench

Compare models in the Workbench

The Workbench is a prompt playground in the dashboard. Run one prompt across several models at once and compare their responses side by side — including latency, tokens, and estimated cost — without writing any code. Every run goes through the gateway like a normal request, so it also shows up in your Analytics.

Run a prompt across models

  1. In the dashboard, open Workbench.
  2. Write your prompt. Add a system message and one or more user turns. Use {{variable}} placeholders and fill their values in the side panel to reuse a template with different inputs.
  3. Select Add model to add columns. Compare several models at once, and set each column’s temperature, max tokens, top P, and seed.
  4. Select Run (or press ⌘/Ctrl+Enter). Responses stream into each column, and every column reports latency, time-to-first-token, token counts, and estimated cost.

Compare the results

  • Toggle Rendered / Raw to switch between formatted and plain output.
  • Pin a column as the baseline and turn on Diff vs baseline to highlight differences.
  • Turn on Sync scroll to scroll all outputs together.
  • Copy all the outputs as Markdown, CSV, or JSON.

Run a dataset

Switch to Dataset mode to test many inputs at once:

  1. Import a dataset (CSV) and map its columns to your {{variables}}.
  2. Run the full matrix — every row against every model.
  3. Review per-model aggregates (latency, cost, tokens) and an upper-bound cost estimate before you run.

Score outputs with an LLM judge

Use the Judge panel to rate outputs against a rubric:

  • Absolute scoring grades each output on its own.
  • Pairwise scoring compares each model against a baseline column.

Pick a judge model and define the rubric dimensions; the judge scores the completed outputs and shows aggregate scores per model.

Reuse and track runs

  • History saves every run. Reopen a past run to inspect it, or re-run it against the current models to track regressions over time.
  • Save frequently used prompts to the prompt library and load them later.

Keyboard shortcuts

KeysAction
⌘/Ctrl + EnterRun
EscCancel the current run
⌘/Ctrl + 1–9Jump to the Nth model column

Next steps

Last updated on