Tag: evaluation
All the articles with the tag "evaluation".
-
What a Null Result Taught Us About AI Agent Evaluation
We tested prompt repetition on 20 parallel AI agents. Ceiling effects dominated both experiments. The null result is a finding about evaluation design.