The AI wars continue to heat up. Just weeks after OpenAI declared a “code red” in its race against Google, the latter released its latest lightweight model: Gemini 3 Flash. This particular Flash is the latest in Google’s Gemini 3 family, which started with Gemini 3 Pro, and Gemini 3 Deep Think. But while this latest model is meant to be a lighter, less expensive variant of the existing Gemini 3 models, Gemini 3 Flash is actually quite powerful in its own right. In fact, it beats out both Gemini 3 Pro and OpenAI’s GPT-5.2 models in some benchmarks.
Lightweight models are typically meant for more basic queries, for lower-budget requests, or to be run on lower-powered hardware. That means they’re often faster than more powerful models that take longer to process, but can do more. According to Google, Gemini 3 Flash combines the best of both those worlds, producing a model with Gemini 3’s “Pro-grade reasoning,” with “Flash-level latency, efficiency, and cost.” While that likely matters most to developers, general users should also notice the improvements, as Gemini 3 Flash is now the default for both Gemini (the chatbot) and AI Mode, Google’s AI-powered search.
Gemini 3 Flash performance
You can see these improvements in Google’s reported benchmarking stats for Gemini 3 Flash. In Humanity’s Last Exam, an academic reasoning benchmark that tests LLMs on 2,500 questions across over 100 subjects, Gemini 3 Flash scored 33.7% with no tools, and 43.5% with search and code execution. Compare that to Gemini 3 Pro’s 37.5% and 45.8% scores, respectively, or OpenAI’s GPT-5.2’s scores of 34.5% and 45.5%. In MMMU-Pro, a benchmark that test a model’s multimodal understanding and reasoning, Gemini 3 Flash got the top score (81.2%), compared to Gemini 3 Pro (81%) and GPT-5.2 (79.5). In fact, across the 21 benchmarking tests Google highlights in its announcement, Gemini 3 Flash has the top score in three: MMMU-Pro (tied with Gemini 3 Pro), Toolathlon, and MMMLU. Gemini 3 Pro still takes the number one spot on the most tests here (14), and GPT-5.2 topped eight tests, but Gemini 3 Flash is holding its own.
Google notes that Gemini 3 Flash also outperforms both Gemini 3 Pro and the entire 2.5 series in the SWE-bench Verified benchmark, which tests the model’s coding agent capabilities. Gemini 3 Flash scored a 78%, while Gemini 3 Pro scored 76.2%, Gemini 2.5 Flash scored 60.4%, and Gemini 2.5 Pro scored 59.6%. (Note that GPT-5.2 scored the best of the models Google mentions in this announcement.) It’s a close race, especially when you consider this is a lightweight model scoring alongside these company’s flagship models.
Gemini 3 Flash cost
That might present an interesting dilemma for developers who pay to use AI models in their programs. Gemini 3 Flash costs $0.50 per every million input tokens (what you ask the model to do), and $3.00 per every million output tokens (the result the models returns from your prompt). Compare that to Gemini 3 Pro, which costs $2.00 per every million input tokens, and $12.00 per every million output tokens, or GPT-5.2’s $3.00 and $15.00 costs, respectively. For what it’s worth, it’s not as cheap as Gemini 2.5 Flash ($0.30 and $2.50), or Grok 4.1 Fast for that matter ($0.20 and $0.50), but it does outperform these models in Google’s reported benchmarks. Google notes that Gemini 3 Flash uses 30% fewer tokens on average than 2.5 Pro, which will save on cost, while also being three times faster.
If you’re someone who needs LLMs like Gemini 3 Flash to power your products, but you don’t want to pay the higher costs associated with more powerful models, I could image this latest lightweight model looking appealing from a financial perspective.
How the average user will experience Gemini 3 Flash
Most of us using AI aren’t doing so as developers who need to worry about API pricing. The majority of Gemini users are likely experiencing the model through Google’s consumer products, like Search, Workspace, and the Gemini app.
What do you think so far?