Elon Musk unveils Grok-3 and Grok-3 mini that outperform other AI chatbots

-

-Advertisement-

In a highly-anticipated online live-stream, owner of X and xAI and CEO of Tesla & SpaceX Elon Musk unveiled the latest versions of the Grok AI chatbot i.e. Grok-3 and Grok-3 mini.

Top members of the xAI’s AI engineering team joined Musk in this presentation to showcase the capabilities of Grok-3 and its scaled-down version Grok-3 mini.

According to xAI’s internal testing and public benchmarking tools, both Grok-3 and Grok-3 mini have surpassed existing chatbots like ChatGPT, DeepSeek R1, and Gemini-2 Flash Thinking by a large margin.

Elon Musk founded xAI in July 2023 to compete with companies like OpenAI where he was an early investor but later had to leave the company because of conflicts with other board members.

– Advertisement –
Elon Musk and his team at xAI unveil the Grok-3 AI chatbot to the world on Monday 17th Feb 2025.
Elon Musk and his team at xAI unveil the Grok-3 AI chatbot to the world on Monday 17th Feb 2025. “Our mission is to understand the universe”. Credit: Elon Musk / xAI / X (Twitter).

Benchmarking Against Other Chatbots

Elon Musk’s xAI tested its Grok-3 and Grok-3 mini chatbots against other advanced AI chatbots in 3 major areas — Mathematics, Science, and Computer Programming (coding and game development).

Testing against benchmarks, the xAI Grok-3 AI bot topped all the charts against existing chatbots in all of the areas mentioned above.

The xAI Grok-3 AI chatbot is designed to think deeply. According to its creators, it reasons many times while thinking. It solves the same problem many times before it concludes what’s the right solution.

Graph: Grok-3 and Grok-3 mini benchmarking results in Maths, Science, and Coding abilities against Gemini-2 Pro, DeepSeek-V3, Claude 3.5 Sonnet, and ChatGPT 4o.
Graph 1: Grok-3 and Grok-3 mini benchmarking results in Maths, Science, and Coding abilities against Gemini-2 Pro, DeepSeek-V3, Claude 3.5 Sonnet, and ChatGPT 4o. Credit: xAI / Elon Musk.
– Advertisement –

Math: AIME 2024 and 2025

During the presentation, an xAI engineer said “Grok-3 is ready to go to college”. As we can see from the above benchmarking comparison bar chart, Grok-3 and Grok-3 mini performed significantly better compared to the competition in the AIME’24 mathematics test.

Even Grok-3 mini scored 40 while DeepSeek-V3 got 39 points. The larger Gork-3 version outperformed every other AI chatbot available on the market with a benchmarking score of 52. The closest to it is its own younger brother Grok-3 mini. Interestingly, ChatGPT’s GPT-4o did the worst in this area with a score of only 9.

xAI has named the current early version of Grok-3 as Chocolate. Elon Musk‘s team at his artificial intelligence company is constantly improving the AI model of Grok to solve even more complex mathematical challenges.

AIME 2025 Performance Bar Chart: Grok-3 Reasoning Beta and Grok-3 mini reasoning compared to ChatGPT o3 mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking.
Graph 2: AIME 2025 Performance Bar Chart: Grok-3 Reasoning Beta and Grok-3 mini reasoning compared to ChatGPT o3 mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking. Credit: xAI / Elon Musk via X.
– Advertisement –

Science: GPQA

The mini version of Grok-3 scored 65 points in Ph.D.-level science questions. This score is the same as Gemini-2 Pro and Claude 3.5 Sonnet. In this specific area, China’s DeepSeek-V3 scored only 59 points.

The larger Grok-3 version did exceptionally well in science with 75 points in the GPQA benchmarking test.

Grok performing the science test with an ace is a positive sign that it will be able to understand the universe better than the rest of AI bots in the future.

Coding / Game Development

Looking at Graph 1 above, we can see that both Grok-3 and Grok-3 mini have a fair advantage in coding and game development over its competing AI chatbots.

Using Grok-3 and its Big Brain option (extra compute), Elon Musk created a combo of Tetris and Bejeweled using the following command statement in plain English:

Using pygame, make a game that is a mix of Tetris and Bejeweled. The code could be very long. Output it as one file. Mat it insanely great.

There was no prompt engineer involved in making this game by the user. This command also gave Grok-3 the freedom of creativity and to choose its own criteria in putting the video game together.

“We’re seeing the beginnings of creativity,” Elon Musk said.

– Advertisement –
Screenshot of Grok-3-created hybrid video game of Tetris and Bejeweled.
Screenshot of Grok-3-created hybrid video game of Tetris and Bejeweled. Credit: xAI / Elon Musk via X (live-stream recording video below).

ChatBot Arena (LMSYS) Benchmarking

xAI’s Grok-3 Chocolate (early version) outperformed all other AI ChatBots tested on the ChatBot Arena benchmarking system.

Grok-3 scored 1400 points in the ChatBot Arena benchmarking test. The closest AI chatbot to Grok-3 is Google’s Gemini-2 Flash-Thinking which scored between 1380 and 1400 (see Graph 3 below).

Graph: Grok-3 Chocolate vs other AI chatbots in ChatBot Arena benchmarking test (LMSYS).
Graph: Grok-3 Chocolate vs other AI chatbots in ChatBot Arena benchmarking test (LMSYS). Credit: xAI / Elon Musk via X.
– Advertisement –
Video: Recording of the Grok-3 AI chatbot unveiling live-stream.

Stay tuned for more Elon Musk news, and videos. Follow us on:
Google News | Flipboard | RSS (Feedly).

Related Elon Musk / AI News

Iqtidar Ali
Iqtidar Alihttp://www.teslaoracle.com
Author of more than 1500 articles on Tesla, SpaceX, and EVs. His work has been liked and tweeted by Elon Musk and other prominent influencers. You can reach him on Twitter @IqtidarAlii

Latest News

Watch Tesla FSD v14.1.4 and v14.1.3 back up to give room to vehicles and pedestrians (sentience is coming)

Tesla (TSLA) has once again rolled out a new FSD v14 (Supervised) point release subversion v14.1.4 to the early...

Elon Musk hints at Optimus V3 unveiling date, describes its core features and appearance

https://www.youtube.com/watch?v=Sdz21k-X7gc Tesla (TSLA) CEO Elon Musk didn't miss a chance the create some hype and excitement around the upcoming Optimus...

Tesla Q3 2025: Financial updates and key takeaways from the earnings call

Tesla (TSLA) released its Q3 2025 financial results and conducted the earnings call and a Q&A session yesterday (listen...

General Tesla owners start getting the FSD v14.1.3 (2025.32.8.15) update, first drive impressions, release notes

Tesla (TSLA) began the rollout of FSD (Supervised) v14.1.3 earlier today. The first wave of this new Full Self-Driving...
- Advertisement -

Stay tuned with the updates in your Inbox

Get the latest Tesla FSD, Software Updates, Starship News in Your inbox.

By hitting the Subscribe button you agree to receiving email communications from TeslaOracle.com.
We don't email everyday. Frequency will be weekly at max.

SpaceX shares stunning videos of Flight 11 Starship and Booster 15 landings

Monday's Starship Flight 11 launch was a success by all measures as SpaceX met all of its objectives exactly...

Starship Flight 11: Read live updates, watch live-stream recordings of the launch

SpaceX has just confirmed that they're attempting today's eleventh Starship launch and landing test. You can check out the...

Starship Flight 11: Here’s how to watch the live-streams and get live updates

We are just a few hours away from Starship's 11th flight test (IFT-11). SpaceX is all geared up, and...

Tesla Tips & Tricks

Calculating Tesla Model 3 range loss through a cold winter night (video)

As the holiday season is nearing, cold weather is...

Here’s how to drive a Tesla Model 3/Y without the center screen (video)

All credit goes to Tesla for making giant center...

Tesla Quarterly Reports & Eearnings

Tesla Q3 2025: Financial updates and key takeaways from the earnings call

Tesla (TSLA) released its Q3 2025 financial results and...

Tesla (TSLA) crushes critics by delivering around ~500k EVs in Q3 2025

Tesla (TSLA) released its Q3 2025 vehicle production and...

Stay tuned with the updates in your Inbox

Get the latest Tesla FSD, Software Updates, Starship News in Your inbox.

By hitting the Subscribe button you agree to receiving email communications from TeslaOracle.com.
We don't email everyday. Frequency will be weekly at max.
- Advertisement -

You might also likeRELATED
Recommended for You