LLM as a Judge, prompt generator

Build an evaluator your judge actually agrees with humans on.

Paste a trace and get a research-grounded judge (evaluator prompt) with drop-in code and a 3-judge stress test. Free, no signup.

Paste an existing trace, span, or system prompt.

We pre-fill the wizard below from what you paste. You review and edit.

or build manually

What are you evaluating?

Pointwise scores one response. Pairwise compares two, only useful if you A/B test.

Mode

Template

This judge will read {{input}} and {{output}}. Progress auto-detects these on import. No variable mapping needed.

Step 1 of 4

Live preview

Evaluator Prompt

You are an impartial evaluator. Decide whether a single response satisfies one criterion.

Criterion: faithfulness.
pass: Every factual claim in the response is supported by the reference. No invented entities, numbers, or quotes.
fail: The response contains at least one claim that is not supported by, or directly contradicts, the reference.

Procedure:
1. Enumerate every factual claim in the response: named entities, numbers, dates, quotes, attributions, and definitive statements about the world.
2. For each claim, locate the specific span in the reference that supports it.
3. If a claim has no supporting span, mark it unsupported. If the reference contradicts a claim, mark it contradicted.
4. Direct paraphrases of explicit reference content are acceptable. Speculative leaps and additions are not.
5. Pass only if every claim is either directly supported or a faithful paraphrase. A single unsupported or contradicted claim fails.
6. Decide pass or fail.

Ground your verdict in the reference, not in your own world knowledge. If the reference is silent on a claim, treat it as unsupported. Do not fill gaps from what you already know.

Ignore length, formatting, and self-identification cues. Do not reward verbosity.

Now evaluate the following:

Input:
{{input}}

Output:
{{output}}

Reference:
{{reference}}

Briefly explain your reasoning in 2 to 5 sentences, then state your verdict.

Score Range Prompt

Use one word only: pass or fail.