Elon Musk unveils Grok-3 and Grok-3 mini that outperform other AI chatbots

In a highly-anticipated online live-stream, owner of X and xAI and CEO of Tesla & SpaceX Elon Musk unveiled the latest versions of the Grok AI chatbot i.e. Grok-3 and Grok-3 mini.

Top members of the xAI’s AI engineering team joined Musk in this presentation to showcase the capabilities of Grok-3 and its scaled-down version Grok-3 mini.

According to xAI’s internal testing and public benchmarking tools, both Grok-3 and Grok-3 mini have surpassed existing chatbots like ChatGPT, DeepSeek R1, and Gemini-2 Flash Thinking by a large margin.

Elon Musk founded xAI in July 2023 to compete with companies like OpenAI where he was an early investor but later had to leave the company because of conflicts with other board members.

– Advertisement –

Elon Musk and his team at xAI unveil the Grok-3 AI chatbot to the world on Monday 17th Feb 2025. “Our mission is to understand the universe”. Credit: Elon Musk / xAI / X (Twitter).

Benchmarking Against Other Chatbots

Elon Musk’s xAI tested its Grok-3 and Grok-3 mini chatbots against other advanced AI chatbots in 3 major areas — Mathematics, Science, and Computer Programming (coding and game development).

Testing against benchmarks, the xAI Grok-3 AI bot topped all the charts against existing chatbots in all of the areas mentioned above.

The xAI Grok-3 AI chatbot is designed to think deeply. According to its creators, it reasons many times while thinking. It solves the same problem many times before it concludes what’s the right solution.

Graph: Grok-3 and Grok-3 mini benchmarking results in Maths, Science, and Coding abilities against Gemini-2 Pro, DeepSeek-V3, Claude 3.5 Sonnet, and ChatGPT 4o. — Graph 1: Grok-3 and Grok-3 mini benchmarking results in Maths, Science, and Coding abilities against Gemini-2 Pro, DeepSeek-V3, Claude 3.5 Sonnet, and ChatGPT 4o. Credit: xAI / Elon Musk.

– Advertisement –

Math: AIME 2024 and 2025

During the presentation, an xAI engineer said “Grok-3 is ready to go to college”. As we can see from the above benchmarking comparison bar chart, Grok-3 and Grok-3 mini performed significantly better compared to the competition in the AIME’24 mathematics test.

Even Grok-3 mini scored 40 while DeepSeek-V3 got 39 points. The larger Gork-3 version outperformed every other AI chatbot available on the market with a benchmarking score of 52. The closest to it is its own younger brother Grok-3 mini. Interestingly, ChatGPT’s GPT-4o did the worst in this area with a score of only 9.

xAI has named the current early version of Grok-3 as Chocolate. Elon Musk‘s team at his artificial intelligence company is constantly improving the AI model of Grok to solve even more complex mathematical challenges.

Graph 2: AIME 2025 Performance Bar Chart: Grok-3 Reasoning Beta and Grok-3 mini reasoning compared to ChatGPT o3 mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking. Credit: xAI / Elon Musk via X.

– Advertisement –

Science: GPQA

The mini version of Grok-3 scored 65 points in Ph.D.-level science questions. This score is the same as Gemini-2 Pro and Claude 3.5 Sonnet. In this specific area, China’s DeepSeek-V3 scored only 59 points.

The larger Grok-3 version did exceptionally well in science with 75 points in the GPQA benchmarking test.

Grok performing the science test with an ace is a positive sign that it will be able to understand the universe better than the rest of AI bots in the future.

Coding / Game Development

Looking at Graph 1 above, we can see that both Grok-3 and Grok-3 mini have a fair advantage in coding and game development over its competing AI chatbots.

Using Grok-3 and its Big Brain option (extra compute), Elon Musk created a combo of Tetris and Bejeweled using the following command statement in plain English:

Using pygame, make a game that is a mix of Tetris and Bejeweled. The code could be very long. Output it as one file. Mat it insanely great.

There was no prompt engineer involved in making this game by the user. This command also gave Grok-3 the freedom of creativity and to choose its own criteria in putting the video game together.

“We’re seeing the beginnings of creativity,” Elon Musk said.

– Advertisement –

Screenshot of Grok-3-created hybrid video game of Tetris and Bejeweled. Credit: xAI / Elon Musk via X (live-stream recording video below).

ChatBot Arena (LMSYS) Benchmarking

xAI’s Grok-3 Chocolate (early version) outperformed all other AI ChatBots tested on the ChatBot Arena benchmarking system.

Grok-3 scored 1400 points in the ChatBot Arena benchmarking test. The closest AI chatbot to Grok-3 is Google’s Gemini-2 Flash-Thinking which scored between 1380 and 1400 (see Graph 3 below).

Graph: Grok-3 Chocolate vs other AI chatbots in ChatBot Arena benchmarking test (LMSYS). Credit: xAI / Elon Musk via X.

– Advertisement –

https://t.co/hEfQ31gANQ
— xAI (@xai) February 18, 2025

Video: Recording of the Grok-3 AI chatbot unveiling live-stream.

Stay tuned for more Elon Musk news, and videos. Follow us on:
Google News | Flipboard | RSS (Feedly).

Elon Musk unveils Grok-3 and Grok-3 mini that outperform other AI chatbots