This demo demonstrates a simulation of Flappy Bird entirely through a diffusion world model rather than coded game logic. I wanted to explore using WebGPU/compute shaders to run diffusion world models directly in the browser as a proof of concept for on-device games that can run at reasonable frame rates.
This can be run on any device that supports WebGPU, both on desktop and mobile. Note that WebGPU is being released as default for the newest versions on iOS, so for older mobile versions we fall back to WASM.
Current Stats (WebGPU Chrome)
- Model Specs: (Latent Denoiser ~5M params, Autoencoder ~60K params)
- 2023 MacBook Pro M2 Pro: 28-31 FPS (f16 precision)
- 2023 MacBook Pro M2 Pro: 23-26 FPS (f32 precision)
- iPhone 14 Pro: 13-15 FPS (f16 precision, WASM fallback)
- iPhone 14 Pro: 7-9 FPS (f32 precision, WASM fallback)
Controls
- Space — Flap (click Flap button or tap screen)
- R — In-game reset (hold Reset button or R key)
- P — Play and Pause the game
- Spawn — Reinitialize the world model
- L — Open Logs window (view FPS and performance)
Roadmap
- Add fallback support for WebGL to avoid WebGPU bugs such as: WebAssembly memory limits and WASM-SIMD footguns affecting specific devices
- Performance drops over time: press Spawn or pause/resume to fix
- Fix denoising steps issue to get more diverse pipe generation
- Bird occasionally ignores gravity and shows visual glitches after hitting pipes
- Add reset state for pipe collision
- Improve autoencoder reconstruction accuracy
- Integrate audio into diffusion world model architecture conditioning
- Optimize performance to 100+ FPS (KV caching, self-forcing, model architecture improvements)
- Implement ControlNet pipeline for real-time environment restyling via text prompts