Run AI locally, zero external dependencies

Double-click to install. Provides OpenAI-compatible API for any project. Vision / Embedding / Speech-to-Text — three services ready out of the box.

Choose your platform

v1.0.0

M1 / M2 / M3 / M4, built-in Metal GPU acceleration

5.83 GB

v1.0.0

NVIDIA GPU (CUDA) recommended

—

Coming soon

🖼️

Gemma 4 E4B Q4_K_M. Image description, text generation, multimodal Q&A. Endpoint: :8080/v1/chat/completions

🔢

BGE-base-zh, 768-dim Chinese embeddings for semantic search and similarity. Endpoint: :8081/v1/embeddings

🎙️

Whisper large-v3-turbo, multilingual transcription. 8x realtime on M4 Max. Endpoint: :8082/inference

All services bind to 127.0.0.1 only. No external network, no cloud cost, fully on-device. Client must run on the same machine as the engine.