EPISODIO 24
OpenAI o3 VS Gemini 2.5 Pro - Molti Benchmark
ARC-AGI 1 e 2:
https://arcprize.org/blog/analyzing-o3-with-arc-agi
;
https://arcprize.org/leaderboard
Virologia:
https://www.virologytest.ai/
PHYBench: [
https://phybench-official.github.io/phybench-demo/;](https://phybench-official.github.io/phybench-demo/)
https://arxiv.org/pdf/2504.16074
GeoBench:
https://geobench.org/
VisualPuzzles:
https://neulab.github.io/VisualPuzzles/
NaturalBench:
https://linzhiqiu.github.io/papers/naturalbench/
Detti e proverbi
Le tweet:
https://x.com/rose_matt/status/1915355300506325282
Il nostro benchmark:
enkk.me/proverbit
Deep Search Lightweight
Le Tweet:
https://x.com/OpenAI/status/1915505959931437178