Google: Gemma 3n 4B

google/gemma-3n-e4b-it

Description

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions.

How this model compares

Overall covers the full catalog. By plan covers only models available on that tier (same rules as available models in your list). Position on min–average–max. Prices use the higher of prompt or completion per token, shown per 1M tokens.

Price (per 1M tokens)

Min
Max
This model
336 models in this groupPrice (per 1M tokens)
Min
$0.04
Avg
$12.381012
Max
$750.00
This model: $0.12 / 1M tokens

Context length (tokens)

Min
Max
This model
336 models in this groupContext length (tokens)
Min
4,095 tokens
Avg
382,115.467 tokens
Max
10,000,000 tokens
This model: 32,768 tokens

Capabilities

Text → TextContext: 32,768 tokens
Input:
Text
Output:
Text
    Google: Gemma 3n 4B