Grok 3 mini crushes custom reasoning test with zero mistakes

Grok 3 mini crushes custom reasoning test with zero mistakes

AI

Latest News

Disney Signs Three-Year Content Licensing Deal With OpenAI for Sora

Dec 17, 2025 23:10

ChatGPT adds native image editing with a new, faster visual model

Dec 16, 2025 19:00

Trump AI Executive Order Sets Up Federal–State Clash Over Regulation

Dec 14, 2025 23:55

Doctors Warn GPT-4 Chatbot Interactions May Aggravate Mental Health Conditions

Dec 13, 2025 23:50

Grok-3-mini has made history as the first model to achieve a flawless score on a custom reasoning benchmark designed to evaluate logic, inference, and resistance to distractions. The test encompasses various challenges, including the “Marcus Problem,” which presents 120 shuffled sentence combinations, and the “Alice+ Problem,” which introduces irrelevant details intended to confound models.

‍

Remarkably, Grok-3-mini demonstrated unwavering accuracy across all categories, even on challenging questions that stump top-tier models such as GPT-4.5 and Gemini 2.5 Pro. This achievement goes beyond mere trivia; these tasks assess genuine reasoning under uncertainty, and Grok-3-mini emerges as the first model to conquer them all.

# Related News

No items found.

# Top News

President Trump, Elon Musk address the left's cries of a constitutional crisis in 'Hannity' exclusive

President Trump, Elon Musk address the left's cries of a constitutional crisis in 'Hannity' exclusive

Latest News

Disney Signs Three-Year Content Licensing Deal With OpenAI for Sora

Dec 17, 2025 23:10

ChatGPT adds native image editing with a new, faster visual model

Dec 16, 2025 19:00

Trump AI Executive Order Sets Up Federal–State Clash Over Regulation

Dec 14, 2025 23:55

Doctors Warn GPT-4 Chatbot Interactions May Aggravate Mental Health Conditions

Dec 13, 2025 23:50