Chaos engineering
with your RBAC
RBAC-native Kubernetes chaos engineering, driven from the API server
kind: Killingname: frontend-pod-killernamespace: productionschedule:interval: 5minitialDelay: 30sselector:labels:app: frontendtier: webminAvailable: 2dryRun: false
Built for SRE teams
Rehearse the failure before production does it for you.
Every incident you ship to prod is a rehearsal you skipped. chaos_zookoo turns crash and recovery scenarios into versioned, reproducible YAML — so you can prove your workloads survive the fault in staging, on a schedule, with an auditable pass/fail signal, long before a pager wakes anyone up.
- 01
Describe the failure
Pick a workload, a fault kind (kill, mass kill, restart), a cadence, and a safety floor. One YAML doc per scenario — readable by anyone on the team.
- 02
Run & observe
Fire synthetic traffic during the disruption, then query Prometheus to assert the SLO held. Results land in your existing Grafana dashboards.
- 03
Ship with evidence
A green
chaos_test_successis a reproducible signal that the workload recovers. Gate your release on it — not on hope.
Why chaos_zookoo
Precision chaos, minimal footprint
A single long-running process authenticated as a ServiceAccount. No custom resources, no operator, no privileged components.
RBAC-native security
No cluster-admin required. The ServiceAccount RBAC is the security model — grant exactly the chaos permissions you intend, nothing more.
API-server only
Every disruption is a regular API call — EvictV1, Pods.Delete, Deployments.Patch. No privileged nodes, no DaemonSets, no sidecars.
YAML-driven config
Declare scenarios in familiar YAML. Each document maps to a module — Killing, GorillaKill, Rollout — with its own schedule and selectors.
Composable middlewares
Wrap any module with synthetic HTTP load generation and post-run Prometheus assertions — without touching the module code.
Measurable resilience
Each passing run is a proof point, not an opinion. Accumulate a versioned track record of recovery — and gate releases on it instead of hope.
Ready to break things safely?
Follow the installation guide and run your first chaos scenario in minutes.