GSO Leaderboard

Loading leaderboard...

Opt@1: Estimator of fraction of tasks where a single attempt achieves ≥95% human speedup and passes correctness tests.

Opt@K: Estimator of fraction of tasks where at least one attempt among K tries achieves ≥95% human speedup and passes correctness tests. See paper for details.