ブロック拡散

DiffusionGemma 26B-A4B 完全解説：自己回帰を捨てた Google の「ブロック並列」テキスト拡散が拓く Open-Weight Frontier 第 7 モデル自己回帰の限界と、テキスト拡散への回帰 2026 年 6 月 10 日、Google DeepMind は DiffusionGemma を Apache 2.0 で公開した。Gemma 4 26B A4B の MoE バックボーンに、昨年 5 月の I/O で発表された Gemini Diffusion の研究成果を統合した「ブロック並列デコード」モデルである。中心的な主張は明確だ。「H100 で 1000 tok/s 以上、GeForce RTX 5090 で 700 tok/s 以上」をローカル推論で実現する。Google 公式の表現は次の通りである。 “Most language models act like a typewriter, generating one token at a time from left to right. In the cloud, this is efficient because servers can batch thousands of user requests together to share the hardware load. But when run locally for a single user, this word-by-word process leaves your dedicated GPU or TPU underutilized. DiffusionGemma reverses this inefficiency. Instead of predicting words sequentially, it drafts an entire 256-token paragraph simultaneously.” ― Brendan O’Donoghue & Sebastian Flenhagenhag, Google Research ...