Back to projects
Incident Simulator
Jul 2025 – Aug 2025SRE / Chaos Engineering

Incident Simulator

Platform for simulating real production incidents.

Screenshots

Project screenshot
Project screenshot

The Problem

Engineers rarely experience realistic system outages during development, making them less prepared for real production failures.

The Solution

Incident Simulator creates controlled outages such as latency spikes and database failures.

Implementation Details

Theory is one thing; debugging a live system while the alerts are firing is another. The Incident Simulator is a safe space to fail.

Controlled Chaos

The 플랫폼 uses custom Docker images that can be "poisoned" via an API call. You can inject high latency, simulate 500 errors, or kill the primary database node to see how the system (and you) reacts.

Game Days

I've used this tool to run "SRE Game Days" where teams have to diagnose and fix a simulated outage within a specific time limit. It's been instrumental in improving incident response times.