Cerberus CERBERUS AI
Menu
Local · Uncensored · Open

Unfiltered intelligence.
Yours to run.

Open-weight language models, refusal-ablated and tuned to run on your own hardware. Desktop app for instant local chat. Self-hosted models. A managed API when you need it.

Join the pack on Discord
Cerberus
Why Cerberus

Built for unrestricted intelligence.

Refusal layers stripped. Weights open. Latency local. Bring your own GPU or call our managed API.

Local-first

Inference runs on your machine through Ollama. No prompts leave your hardware. No telemetry. No cloud round-trip.

Refusal-ablated

Surgical removal of the refusal direction in activation space. Core reasoning preserved, "I can't help with that" deleted.

Hardware-aware

Auto-detects your VRAM and recommends a quantization that actually fits. From 4GB laptops to 24GB workstations.

Open weights

F16, Q8_0, and Q4_K_M GGUF artifacts. Download once, run anywhere llama.cpp runs. No license gates.

One-line install

Up and running in under a minute.

Paste it into PowerShell. The installer pulls WebView2, Ollama, the recommended model for your GPU, and the desktop app — then launches.

PS>  
✓ WebView2 runtime detected
✓ Ollama installed
✓ Pulled cerberus-4b-v2-abliterated:Q4_K_M (2.5 GB)
✓ Cerberus Desktop launched
Models

Pick your weight class.

Three uncensored model families in GGUF. Hosted on llm.cerberusai.dev.

Cerberus 4B v2 Abliterated

Complete refusal ablation. Total cognitive freedom. Built on Qwen 2.5 4B.

F16 · full ~7.5 GB

Cerberus 4B v2

f16 · full precision

Reference weights. Use when you want maximum fidelity and you have ≥ 16 GB VRAM to spare.

Download F16
Q8_0 · recommended ~4.0 GB

Cerberus 4B v2

Q8_0 · 8-bit quantized

Best quality-to-size ratio. Indistinguishable from F16 in most generations. Fits on an 8 GB GPU.

Download Q8_0
Q4_K_M · compact ~2.5 GB

Cerberus 4B v2

Q4_K_M · 4-bit quantized

For laptops, low-VRAM builds, and anything tight on disk. Default pick when GPU detection finds < 8 GB.

Download Q4_K_M

Arbiter GL9b

9B GLM-4 base. Unfiltered and highly intelligent — when you need more reasoning headroom than a 4B can give.

Q4_K_M · recommended ~5.8 GB

Arbiter GL9b

Q4_K_M · 4-bit quantized

Best quality-to-size ratio for the 9B class. Fits on a 6–8 GB GPU with room for context.

Download Q4_K_M
Q3_K_M · compact ~4.7 GB

Arbiter GL9b

Q3_K_M · 3-bit quantized

Sweet spot between Q2 and Q4 — strong quality at low memory. The 9B you can run on a 4 GB card.

Download Q3_K_M

Gamma3 1B Abliterated

Compact 1B parameter model. Lightweight enough for CPU-only inference, edge devices, and mobile — without the corporate guardrails.

F16 · full ~2.0 GB

Gamma3 1B

f16 · full precision

Reference weights for the 1B class. Maximum fidelity at a footprint smaller than most 4B Q4 builds.

Download F16
Q8_0 · recommended ~1.07 GB

Gamma3 1B

Q8_0 · 8-bit quantized

Best quality-to-size ratio. Runs well on phones, Raspberry Pi-class hardware, and CPU-only laptops.

Download Q8_0
Q4_K_M · compact ~806 MB

Gamma3 1B

Q4_K_M · 4-bit quantized

For tight-budget edge deployment. Sub-1 GB on disk — the smallest weight class we host.

Download Q4_K_M
Managed API

Skip the GPU. Just call the endpoint.

OpenAI-compatible. Streaming. Pay-as-you-go credits. Self-hosted control plane on access.cerberusai.dev — your keys, your usage, no hidden middlemen.

$ curl https://api.cerberusai.dev/v1/chat/completions \
-H "Authorization: Bearer $CRB_KEY" \
-d '{"model":"cerberus-4b-v2","messages":[...]}'
Pricing

Pay for usage. Skip the local rig.

Three monthly tiers. Each renews credits and unlocks Desktop App access. Mid + EXP also get premium model downloads (Q8/F16/Q4_K_M-9B).

Stripe and PayPal supported · Cancel anytime · 1 USD = 10,000 credits

Fast Start
Lite
$8/mo
80,000 credits / month
  • Solo use
  • API key access
  • Desktop App Access
  • Great for testing
Choose Lite →
Most Balanced
Most Balanced
Mid
$15/mo
180,000 credits / month
★ Premium models included
  • Daily workflows
  • More session headroom
  • Better value per credit
  • Best for builders
  • Desktop App Access
  • Premium model downloads
Choose Mid →
Heavy Usage
EXP
$22/mo
320,000 credits / month
★ Premium models included
  • Longest runtime cushion
  • Ideal for larger prompts
  • Best for repeated API calls
  • Priority-ready posture
  • Desktop App Access
  • Premium model downloads
Choose EXP →

Join the pack.

Builders, researchers, and people who think "as a language model" is a refusal. Trade prompts, models, and benchmarks.

Join Discord
We accept
AMEX

Secure subscriptions and one-time top-ups via Stripe and PayPal. Card processing, refunds, and storage handled by the payment provider.