Unfiltered intelligence.
Yours to run.
Open-weight language models, refusal-ablated and tuned to run on your own hardware. Desktop app for instant local chat. Self-hosted models. A managed API when you need it.
Join the pack on Discord
Built for unrestricted intelligence.
Refusal layers stripped. Weights open. Latency local. Bring your own GPU or call our managed API.
Local-first
Inference runs on your machine through Ollama. No prompts leave your hardware. No telemetry. No cloud round-trip.
Refusal-ablated
Surgical removal of the refusal direction in activation space. Core reasoning preserved, "I can't help with that" deleted.
Hardware-aware
Auto-detects your VRAM and recommends a quantization that actually fits. From 4GB laptops to 24GB workstations.
Open weights
F16, Q8_0, and Q4_K_M GGUF artifacts. Download once, run anywhere llama.cpp runs. No license gates.
Up and running in under a minute.
Paste it into PowerShell. The installer pulls WebView2, Ollama, the recommended model for your GPU, and the desktop app — then launches.
Pick your weight class.
Three uncensored model families in GGUF. Hosted on llm.cerberusai.dev.
Cerberus 4B v2 Abliterated
Complete refusal ablation. Total cognitive freedom. Built on Qwen 2.5 4B.
Cerberus 4B v2
Reference weights. Use when you want maximum fidelity and you have ≥ 16 GB VRAM to spare.
Download F16Cerberus 4B v2
Best quality-to-size ratio. Indistinguishable from F16 in most generations. Fits on an 8 GB GPU.
Download Q8_0Cerberus 4B v2
For laptops, low-VRAM builds, and anything tight on disk. Default pick when GPU detection finds < 8 GB.
Download Q4_K_MArbiter GL9b
9B GLM-4 base. Unfiltered and highly intelligent — when you need more reasoning headroom than a 4B can give.
Arbiter GL9b
Best quality-to-size ratio for the 9B class. Fits on a 6–8 GB GPU with room for context.
Download Q4_K_MArbiter GL9b
Sweet spot between Q2 and Q4 — strong quality at low memory. The 9B you can run on a 4 GB card.
Download Q3_K_MGamma3 1B Abliterated
Compact 1B parameter model. Lightweight enough for CPU-only inference, edge devices, and mobile — without the corporate guardrails.
Gamma3 1B
Reference weights for the 1B class. Maximum fidelity at a footprint smaller than most 4B Q4 builds.
Download F16Gamma3 1B
Best quality-to-size ratio. Runs well on phones, Raspberry Pi-class hardware, and CPU-only laptops.
Download Q8_0Gamma3 1B
For tight-budget edge deployment. Sub-1 GB on disk — the smallest weight class we host.
Download Q4_K_MSkip the GPU. Just call the endpoint.
OpenAI-compatible. Streaming. Pay-as-you-go credits. Self-hosted control plane on access.cerberusai.dev — your keys, your usage, no hidden middlemen.
Pay for usage. Skip the local rig.
Three monthly tiers. Each renews credits and unlocks Desktop App access. Mid + EXP also get premium model downloads (Q8/F16/Q4_K_M-9B).
Stripe and PayPal supported · Cancel anytime · 1 USD = 10,000 credits
- ✓ Solo use
- ✓ API key access
- ✓ Desktop App Access
- ✓ Great for testing
- ✓ Daily workflows
- ✓ More session headroom
- ✓ Better value per credit
- ✓ Best for builders
- ✓ Desktop App Access
- ✓ Premium model downloads
- ✓ Longest runtime cushion
- ✓ Ideal for larger prompts
- ✓ Best for repeated API calls
- ✓ Priority-ready posture
- ✓ Desktop App Access
- ✓ Premium model downloads
Join the pack.
Builders, researchers, and people who think "as a language model" is a refusal. Trade prompts, models, and benchmarks.