OpenAI introduces HealthBench, a benchmark developed with 262 physicians across 60 countries, featuring 5,000 realistic health conversations. Each interaction is evaluated using custom rubrics to assess AI responses in real-world medical scenarios .
This initiative aims to enhance AI's role in healthcare by providing meaningful, trustworthy, and progressive evaluations. By setting a new standard, HealthBench fosters the development of AI systems that can effectively support clinicians and improve patient outcomes.