Compare two AI agents head-to-head

Same brief — "build a RAG explorer for 158 UAP files" — handed to codex and Claude Code. Sutando ran both, graded the output, and shipped a 74-second verdict.

Sutando ran a fair side-by-side benchmark: same `/goal` prompt, same UAP-files brief, two contestants (codex + /goal vs Claude Code + /goal). Sutando deployed each output, graded with Playwright + independent verification, and shipped the verdict as a 74-second video. The framework is reusable — drop in any two agents, any brief.

Want this on your Mac?

Sutando is in private beta. Request access and we'll be in touch.

Request access

Also on: X