Google Gemma 4: The Most Capable Open AI Models You Can Run Locally
2 min readGoogle just handed the open-source AI community its most powerful toolkit yet. On April 2, 2026, the company released Gemma 4 — a family of four open models built directly from the same research that powers Gemini 3, and for the first time in the Gemmaverse, licensed under the fully permissive Apache 2.0 license.
What Is Gemma — and Why Does the License Matter?
Gemma has been Google’s answer to the growing demand for capable, locally runnable AI. Since the first generation launched, developers have downloaded Gemma models more than 400 million times, spawning over 100,000 community variants. The series sits in a unique position: frontier-class research distilled into smaller models that individuals and businesses can actually run on their own hardware.
The shift to Apache 2.0 is significant. Previous Gemma releases used a custom license that carried restrictions. Apache 2.0 — an OSI-approved open-source license — gives developers explicit permission to download, modify, fine-tune, and deploy Gemma 4 models for commercial purposes with minimal obligations. That’s a meaningful signal that Google is serious about winning the open ecosystem.
What’s New in Gemma 4
Gemma 4 arrives in four sizes to cover everything from smartphones to servers. The 2B and 4B “Effective” models are engineered for on-device deployment — running fully offline on phones, Raspberry Pi boards, and devices like the NVIDIA Jetson Orin Nano. The 26B Mixture-of-Experts (MoE) and 31B Dense variants are aimed at more powerful machines and enterprise workloads.
Key upgrades across the family include a 256K-token context window, native multimodal capabilities (text, vision, and audio), and support for over 140 languages. Google says the models are up to 4x faster than their predecessors and use up to 60% less battery — a direct win for anyone running inference on-device. Agentic workflows, complex multi-step reasoning, and offline code generation round out the feature set, as reported by Google’s official Gemma 4 announcement.
Why It Matters
The open-source AI race just got more competitive. With Gemma 4 under Apache 2.0, Google is positioning itself as the enterprise-friendly alternative to Meta’s Llama family — while still maintaining a closed frontier through Gemini. For developers building agentic apps, RAG pipelines, or on-device features, Gemma 4 offers a commercially safe foundation with benchmark performance that tracks closely with the models powering Google’s own products.
The on-device story is equally important. As AI moves off the cloud and onto consumer hardware, the 2B and 4B Effective models are now among the most capable options for private, offline inference. Android integration is already in developer preview via AICore, signaling that Gemma 4 could be the engine running AI features natively on hundreds of millions of Android devices in the near future.
Where to Get It
Gemma 4 is available today on Google AI Studio, Hugging Face, Kaggle, Ollama, and Google Cloud. The full model lineup — including weights, documentation, and fine-tuning guides — is accessible directly from Google DeepMind’s Gemma 4 page.
Continue Reading: Google’s full Gemma 4 announcement →
