Grok 3 mini crushes custom reasoning test with zero mistakes

Grok 3 mini crushes custom reasoning test with zero mistakes
AI
Latest News

Grok-3-mini has made history as the first model to achieve a flawless score on a custom reasoning benchmark designed to evaluate logic, inference, and resistance to distractions. The test encompasses various challenges, including the “Marcus Problem,” which presents 120 shuffled sentence combinations, and the “Alice+ Problem,” which introduces irrelevant details intended to confound models.

Remarkably, Grok-3-mini demonstrated unwavering accuracy across all categories, even on challenging questions that stump top-tier models such as GPT-4.5 and Gemini 2.5 Pro. This achievement goes beyond mere trivia; these tasks assess genuine reasoning under uncertainty, and Grok-3-mini emerges as the first model to conquer them all.