Runs agent evaluations with memoization. Only runs (model, eval) pairs that haven't been completed yet. claude-opus-4.6 vercel-ai-gateway/claude-code anthropic/claude ...