Manual Evaluating in Varex UI from Playground
We provide a simple way to evaluate your data manually in the Varex UI. This is useful when you want to evaluate a small subset of questions and ensure model's response is correct.

Metrics and Evaluation
Evaluation metrics are quantitative tools used to gauge the performance and effectiveness of an Agent. These metrics assess various aspects of an Agent's responses, including relevance to the question asked, correctness, and context fidelity.
Answer Relevance
The "Answer Relevance" metric evaluates the extent to which a generated answer matches the question posed. Answers that are incomplete or include unnecessary information receive lower scores. This metric hinges on a direct comparison between the question and its answer.
Example:
- Question: Where is France and what is its capital?
- Low relevance answer: France is in Western Europe.
- High relevance answer: France is located in Western Europe, and its capital is Paris.
Answer Correctness
Answer correctness addresses the semantic and factual alignment of the generated response with the actual information requested. It encompasses two key dimensions: semantic similarity and factual accuracy.
Faithfulness
"Faithfulness" pertains to how accurately a response reflects the given context. An answer is deemed correct if it logically follows from the context provided.
Example:
- Question: Where and when was Einstein born?
- Context: Albert Einstein (born March 14, 1879) was a German theoretical physicist, widely regarded as one of the greatest and most influential scientists of all time.
- High fidelity answer: Einstein was born in Germany on March 14, 1879.
- Low fidelity answer: Einstein was born in Germany on March 20, 1879.
Context Relevancy
This metric measures the accuracy of the Agent's search or retrieval component, assessing how well the context found aligns with the question. It is a crucial aspect of ensuring that the Agent provides responses that are not only accurate but also relevant to the inquiry at hand.