Why is the International Math Olympiad showing up in my feed?
For the first time ever, LLMs hit the gold standard at math's most prestigious competition

This week, news about the International Mathematical Olympiad (IMO) made headlines in the mainstream zeitgeist, something it’s rarely done before. If you hadn’t heard of it before, you’re not alone, even though it’s generally considered the most prestigious math competition in the world. Each year, countries send up to six participants to compete for gold, silver, or bronze medals. Participants must be under 20 years old and have not yet attended a university, and must make it through a grueling selection process that varies widely by country.
The competition itself is spread across two days, with contestants getting four-and-a-half hours each day to solve six total problems across geometry, number theory, algebra, and combinatorics. The IMO has a guiding principle that problems must be designed so anyone with “basic math knowledge can understand what's being asked”, which might sound manageable until you realize these aren't your typical math problems. Let’s take a look at a geometry problem from this year’s six IMO problems:
Let Ω and Γ be circles with centres M and N, respectively, such that the radius of Ω is less than the radius of Γ. Suppose circles Ω and Γ intersect at two distinct points A and B. Line MN intersects Ω at C and Γ at D, such that points C, M, N and D lie on the line in that order. Let P be the circumcentre of triangle ACD. Line AP intersects Ω again at E ̸= A. Line AP intersects Γ again at F ̸= A. Let H be the orthocentre of triangle PMN. Prove that the line through H parallel to AP is tangent to the circumcircle of triangle BEF.
When I was around 10 years old, I attended a math camp. I don’t know if I was selected by my school or if my parents paid for the thing, but all I remember learning and being really excited about was a Dukes of Hazard-themed problem with multiple steps that produced the result “6045508”. When you turned your calculator upside down, that number spelled “BOSSHOG”. So it should not surprise you that the only thing I understand from the IMO problems is that I wouldn’t get remotely close to getting one correct. (Note: the above geometry problem is problem #2 of 6, and each get progressively harder.)
The 2025 Competition
This year’s competition saw 630 students from 110 countries; with 67 of those students winning a gold by solving at least 5 of the 6 problems. But there were two other participants this year to take on the challenge: though not officially competing, OpenAI and Google’s DeepMind gained special permission from the IMO to let unreleased models work through the same problems under similar constraints.
The result? Both systems achieved gold medal performance1, answering 5 of 6 questions correctly. What makes this interesting isn't that AI can solve these problems, but how it's doing it. Both OpenAI and DeepMind claim to have done so with generalized deep reasoning models that were not specifically trained for the competition. And, they used no external tools, calculators, no code interpreters, no formal verification systems. Just sustained creative thinking and reasoning in natural language with thought processes that lasted hours. That's significant because it suggests we're beginning to see genuine mathematical understanding in LLMs, rather than computational shortcuts or pattern matching on things its already seen.
What caught the attention of IMO graders wasn't just that the AI systems got the right answers, but how they presented their solutions:
"We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points — a gold medal score. Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow." - IMO President Prof. Dr. Gregor Dolinar2
Just to give you an idea of how fast this is advancing, last year at the 2024 competition, two DeepMind models required experts to first translate the 6 problems from natural language into “domain-specific” languages to help the models understand the task. It still took two to three days of computation to get results, and even then it only correctly answered 4 problems. An advanced version of Gemini Deep Think, the model used in this year’s competition, did it all using natural language within the 4.5 hour time limit.
Of course as with anything like this there is controversy, especially around the claims that the LLMs didn’t game this in some way, or on the topic of what constitutes “fair play”. After all, the students are only allowed pen and paper. They can’t communicate with each other or with coaches. They have a psyche that can become overwhelmed, mental stamina that drains as the hours drag on. They get frustrated when they’re stumped, and as the clock ticks down, they may get sloppy with their results in a panicked rush. None of this really affects the machine. It doesn’t get impatient. It doesn’t know how to worry. It doesn’t feel the weight of representing an entire country.
And so while debates around fair play may live on, we should actually look at it from a different angle: high school students with limited resources score just as well as world class large language models that have seemingly infinite advantages and resources. Maybe the true gift of the models’ successes in the IMO is appreciating how extraordinary the human mind has been all along.
—
By the way, if you’re curious to see the proof Open AI’s model came up with for problem #2, have at it: https://github.com/aw31/openai-imo-2025-proofs/blob/main/problem_2.txt
Despite what you may have read in some outlets, they were not awarded gold medals, but achieved a score that met the threshold required for gold, scoring at least 35 points out of 42.