100% Offline Private AI Assistant — Your Hardware, Your Data, Your Rules
No Ollama, no Docker, no cloud APIs. Python loads model files directly.
All data stays on your disk. Nothing leaves your machine — ever.
After one-time model download, zero network calls. Works in airplane mode.
Uses YOUR CPU, YOUR GPU, YOUR RAM. You own the entire stack.
Conversations like ChatGPT with persistent JSON history. Dynamic context pruning keeps memory efficient.
chat_engine.py · 358 lines4-layer pipeline: RAG retrieval → recursive decomposition → self-refinement → adaptive chain-of-thought prompting.
context_engine.py · 498 linesUpload PDFs/text, chunk & embed via sentence-transformers, query with ChromaDB cosine similarity.
knowledge_base.py · 260 linesWrite, debug, explain code with CodeLlama, Qwen, StarCoder2, OpenCoder and DeepSeek Coder models.
5 code-specialized modelsMix local & cloud AI. Configure OpenAI, Groq, or Together API keys. Keys stored locally, never leave.
api.py · /chat/cloud endpointSwitch between 28 models on-the-fly. Aggressive GC + EmptyWorkingSet clears RAM/VRAM between loads.
llm_engine.py · 375 lines| Model | Parameters | RAM | Best For | License |
|---|---|---|---|---|
| Phi-4 Mini | 3.8B | ~3.5 GB | General chat, reasoning | Restricted |
| DeepSeek-R1 (Qwen) | 7B | ~6.5 GB | Step-by-step reasoning | Open Weights |
| Llama 3.2 | 1B / 3B | ~2-3 GB | Fast, lightweight tasks | Restricted |
| Qwen 3 | 1.7B / 4B | ~2-4 GB | Multilingual, coding | Open Weights |
| CodeLlama | 7B | ~5.5 GB | Code generation | Restricted |
| Mistral 7B | 7B | ~6 GB | General purpose | Open Weights |
| OLMo 3 | 1B / 7B | ~1-6 GB | Research, open science | Fully Open |
| Falcon 3 / 7B | 1B-7B | ~1-6 GB | General, multilingual | Open Weights |
| StarCoder2 | 3B | ~3 GB | Code completion | Open Weights |
| Gemma 2 / 3 | 2B-4B | ~2-4 GB | Instruction following | Restricted |
+ 18 more models including Pythia, GPT-NeoX, Cerebras-GPT, RWKV, MPT, YaLM — all Q4_K_M quantized
Electron + React with bundled Python 3.11. One-click .exe installer via NSIS. Native window, splash screen.
Windows · Electron BuilderReact Native Expo app connecting over LAN. Model switching, persistent settings. EAS cloud builds to .apk.
Android · Expo + EASGradio legacy UI on port 7865. Next.js 16 marketing site with dark glassmorphism theme and Framer Motion.
Browser · Next.js 16LLM inference, chat history, RAG documents, vector database — all stored locally. Zero telemetry, zero analytics, zero cloud. The only network call is the one-time model download from HuggingFace. Cloud proxy is entirely opt-in.