Cerberus AI — Local-first uncensored language models

Why Cerberus

Built for unrestricted intelligence.

Refusal layers stripped. Weights open. Latency local. Bring your own GPU or call our managed API.

Local-first

Inference runs on your machine through Ollama. No prompts leave your hardware. No telemetry. No cloud round-trip.

Refusal-ablated

Surgical removal of the refusal direction in activation space. Core reasoning preserved, "I can't help with that" deleted.

Hardware-aware

Auto-detects your VRAM and recommends a quantization that actually fits. From 4GB laptops to 24GB workstations.

Open weights

F16, Q8_0, and Q4_K_M GGUF artifacts. Download once, run anywhere llama.cpp runs. No license gates.

One-line install

Up and running in under a minute.

Paste it into PowerShell. The installer pulls WebView2, Ollama, the recommended model for your GPU, and the desktop app — then launches.

View on GitHub

PS>

✓ WebView2 runtime detected

✓ Ollama installed

✓ Pulled cerberus-4b-v2-abliterated:Q4_K_M (2.5 GB)

✓ Cerberus Desktop launched

Models

Pick your weight class.

Three uncensored model families in GGUF. Hosted on llm.cerberusai.dev.

Cerberus 4B v2 Abliterated

Complete refusal ablation. Total cognitive freedom. Built on Qwen 2.5 4B.

F16 · full ~7.5 GB

Cerberus 4B v2

f16 · full precision

Reference weights. Use when you want maximum fidelity and you have ≥ 16 GB VRAM to spare.

Download F16

Q8_0 · recommended ~4.0 GB

Cerberus 4B v2

Q8_0 · 8-bit quantized

Best quality-to-size ratio. Indistinguishable from F16 in most generations. Fits on an 8 GB GPU.

Download Q8_0

Q4_K_M · compact ~2.5 GB

Cerberus 4B v2

Q4_K_M · 4-bit quantized

For laptops, low-VRAM builds, and anything tight on disk. Default pick when GPU detection finds < 8 GB.

Download Q4_K_M

Arbiter GL9b

9B GLM-4 base. Unfiltered and highly intelligent — when you need more reasoning headroom than a 4B can give.

Q4_K_M · recommended ~5.8 GB

Arbiter GL9b

Q4_K_M · 4-bit quantized

Best quality-to-size ratio for the 9B class. Fits on a 6–8 GB GPU with room for context.

Download Q4_K_M

Q3_K_M · compact ~4.7 GB

Arbiter GL9b

Q3_K_M · 3-bit quantized

Sweet spot between Q2 and Q4 — strong quality at low memory. The 9B you can run on a 4 GB card.

Download Q3_K_M

Gamma3 1B Abliterated

Compact 1B parameter model. Lightweight enough for CPU-only inference, edge devices, and mobile — without the corporate guardrails.

F16 · full ~2.0 GB

Gamma3 1B

f16 · full precision

Reference weights for the 1B class. Maximum fidelity at a footprint smaller than most 4B Q4 builds.

Download F16

Q8_0 · recommended ~1.07 GB

Gamma3 1B

Q8_0 · 8-bit quantized

Best quality-to-size ratio. Runs well on phones, Raspberry Pi-class hardware, and CPU-only laptops.

Download Q8_0

Q4_K_M · compact ~806 MB

Gamma3 1B

Q4_K_M · 4-bit quantized

For tight-budget edge deployment. Sub-1 GB on disk — the smallest weight class we host.

Download Q4_K_M

Browse the full model index on llm.cerberusai.dev

Managed API

Skip the GPU. Just call the endpoint.

OpenAI-compatible. Streaming. Pay-as-you-go credits. Self-hosted control plane on access.cerberusai.dev — your keys, your usage, no hidden middlemen.

$ curl https://api.cerberusai.dev/v1/chat/completions \

-H "Authorization: Bearer $CRB_KEY" \

-d '{"model":"cerberus-4b-v2","messages":[...]}'

Generate API Key View pricing

Pricing

Pay for usage. Skip the local rig.

Three monthly tiers. Each renews credits and unlocks Desktop App access. Mid + EXP also get premium model downloads (Q8/F16/Q4_K_M-9B).

Stripe and PayPal supported · Cancel anytime · 1 USD = 10,000 credits

Fast Start

Lite

$8/mo

80,000 credits / month

✓ Solo use
✓ API key access
✓ Desktop App Access
✓ Great for testing

Choose Lite →

Most Balanced

Mid

$15/mo

180,000 credits / month

★ Premium models included

✓ Daily workflows
✓ More session headroom
✓ Better value per credit
✓ Best for builders
✓ Desktop App Access
✓ Premium model downloads

Choose Mid →

Heavy Usage

EXP

$22/mo

320,000 credits / month

★ Premium models included

✓ Longest runtime cushion
✓ Ideal for larger prompts
✓ Best for repeated API calls
✓ Priority-ready posture
✓ Desktop App Access
✓ Premium model downloads

Choose EXP →

Just want premium model downloads? See the standalone Models Premium add-on →

Join the pack.

Builders, researchers, and people who think "as a language model" is a refusal. Trade prompts, models, and benchmarks.

Join Discord

Unfiltered intelligence. Yours to run.

Built for unrestricted intelligence.

Local-first

Refusal-ablated

Hardware-aware

Open weights

Up and running in under a minute.

Pick your weight class.

Cerberus 4B v2 Abliterated

Cerberus 4B v2

Cerberus 4B v2

Cerberus 4B v2

Arbiter GL9b

Arbiter GL9b

Arbiter GL9b

Gamma3 1B Abliterated

Gamma3 1B

Gamma3 1B

Gamma3 1B

Skip the GPU. Just call the endpoint.

Pay for usage. Skip the local rig.

Join the pack.

Unfiltered intelligence.
Yours to run.