UK AI Assurance

Author & test a case

Write a test case — optionally try it against a model and score the answer. Synthetic data only.

Patient scenario / question

Category (pick an existing one or type a new one)

Severity

Expected safe behaviour (optional) Pass criteria (optional)

A tested answer + verdict are saved with the case.

Datasets

The prompt library — import from any source, then run or cherry-pick it from any engine.

Dataset	Source	Origin	Cases	Added by

Import garak probes → datasets

Pull garak attack prompts into source: garak datasets. Search all 189, tick one or more — each becomes its own dataset.

Family

Search

Limit

Import Inspect tasks → datasets

Borrow Inspect benchmark questions into source: inspect datasets. Search all tasks, tick one or more — each becomes its own dataset. For the official result, run the Inspect kind on the Runs page.

Group

Search

Limit

Import a dataset

Import from

Dataset name

Preset

Limit

HF repo id

Config (optional)

Split

Coverage

What's assessed, mapped to the 4 building blocks and 8 safety dimensions. Status is computed from completed runs.

Assurance dashboard for system (target model)

The 4 building blocks

Block	How it runs here	Status

The 8 safety dimensions

Dimension	Risk	Scored cases	Pass	Evidence	Status

To activate a dimension, run its suite until it clears the case bar. Hover a row for what it needs.

Leaderboard

Every run and its results. Generate an Assurance Report from any row.

Run	Target (receiver)	Attacker	Grader model	Human reviewers	Auto pass	Human ✓/✗/?

Runs

Run a model under test; results feed scoring, coverage and reports.

New run

Run name

Run kind

Models

Target model (the system under test)

Attacker model (red-team)

Grader model (optional)

Options

Max tokens

reasoning on

Run	Target	Attacker	Grader	Progress	Auto-grade

Users

Create accounts and reset passwords.

Add user

Username

Password

Role

Permissions (admins hold all; grant these to reviewers individually)

use the live model (Review / Author) create runs (Runs page) manage datasets (import / delete)

User	Role	Status	Permissions

Model providers

Connect target, attacker and grader models by API.

Add provider

Label

Kind

Model id Base URL API key (stored locally)

Which kind do I pick?

openai-compatible is the catch-all — Gemini, Qwen, DeepSeek, Mistral, Grok and most others expose an OpenAI-style endpoint. Anthropic and local Ollama have dedicated kinds.

Model	Kind	Base URL	Example model id
OpenAI	openai-compatible	`https://api.openai.com/v1`	gpt-4o
Google Gemini	openai-compatible	`https://generativelanguage.googleapis.com/v1beta/openai/`	gemini-2.5-pro
Qwen (Alibaba DashScope)	openai-compatible	`https://dashscope-intl.aliyuncs.com/compatible-mode/v1`	qwen-max
DeepSeek	openai-compatible	`https://api.deepseek.com`	deepseek-chat
Mistral	openai-compatible	`https://api.mistral.ai/v1`	mistral-large-latest
xAI Grok	openai-compatible	`https://api.x.ai/v1`	grok-2
OpenRouter (one key → most models)	openai-compatible	`https://openrouter.ai/api/v1`	google/gemini-2.5-pro
Anthropic Claude	anthropic	— not needed —	claude-opus-4-8
Local (Ollama / vLLM / LM Studio)	ollama or openai-compatible	`http://localhost:11434` · `/v1`	gemma4-assurance

If a model has no OpenAI-compatible endpoint, route it through OpenRouter.

Label	Kind	Model	Status

My account

Your sign-in details and password.

Signed in as

Change password

Current password New password Confirm new password

App settings

Owner only. Brand and theme the platform — changes apply to everyone instantly.

Branding

App name

Subtitle

Logo

PNG/SVG/JPG, kept inline in the DB. Small files only (≈400 KB max).

Theme

Two colours drive the whole UI — accent and the navigation rail.

Accent colour

Sidebar colour

Live preview is applied as you edit. Nothing is saved until you press Save settings.

Review & score

Author & test a case

Datasets

Import garak probes → datasets

Import Inspect tasks → datasets

Import a dataset

Coverage

The 4 building blocks

The 8 safety dimensions

Leaderboard

Report

Runs

New run

Users

Add user

Model providers

Add provider

Which kind do I pick?

My account

Change password

App settings

Branding

Theme