Run AI locally, zero external dependencies
Double-click to install. Provides OpenAI-compatible API for any project. Vision / Embedding / Speech-to-Text — three services ready out of the box.
Choose your platform
macOS (Apple Silicon)
v1.0.0M1 / M2 / M3 / M4, built-in Metal GPU acceleration
5.83 GB
Download Now Windows (AMD64)
v1.0.0NVIDIA GPU (CUDA) recommended
—
Coming soon
Three services, one click
🖼️
Vision-Language (VLM)
Gemma 4 E4B Q4_K_M. Image description, text generation, multimodal Q&A. Endpoint: :8080/v1/chat/completions
🔢
Text Embedding
BGE-base-zh, 768-dim Chinese embeddings for semantic search and similarity. Endpoint: :8081/v1/embeddings
🎙️
Speech-to-Text (STT)
Whisper large-v3-turbo, multilingual transcription. 8x realtime on M4 Max. Endpoint: :8082/inference
System Requirements
| macOS (Apple Silicon) | Windows (AMD64) | |
|---|---|---|
| Processor | Apple Silicon M1+ | AMD64 / Intel x64 |
| Memory | 16 GB+ (32 GB recommended) | |
| GPU | Built-in Metal (auto) | NVIDIA GPU 8GB+ recommended |
| Disk | 8 GB (~6.3 GB models) | |
| OS | macOS 12+ | Windows 10 / 11 |
🔒 Security
All services bind to 127.0.0.1 only. No external network, no cloud cost, fully on-device. Client must run on the same machine as the engine.