Monitoring
Prometheus + Grafana Signals Lab
SLO drafting, recording rules, and alert fatigue triage with synthetic checks that behave like production surprises.
- Format
- Night labs + mentor AMA
- Duration
- 5 weeks · nightly office hours optional
- Tuition (informational)
- KRW 1,320,000
- Mentor
- Mateo Silva
Program narrative
Labs emit intentionally noisy metrics so you practice silencing vs fixing. We weave incident retrospectives into dashboards—annotating spikes with human-readable context instead of empty charts.
What is included
- Histogram bucket tuning with concrete SLIs
- Alertmanager routing trees with on-call shadowing
- Recording rule cost tradeoff spreadsheet
- Exemplar tracing bridge to Tempo (read-only)
- Dashboard review rubric used by release mentors
- Post-incident template aligned to internal comms style
- Dark launch metric canary exercise
Outcomes you can show
- Ship a three-tier SLO doc tied to business KPIs
- Reduce paging noise with documented routes
- Facilitate a retro using our annotation pattern
Mateo Silva
Monitoring specialist; previously embedded with SaaS operations groups in Singapore.
Cohort FAQ
Accordion stays compact—one limitation answer is baked into each course.
We integrate Thanos concepts but do not host long-term retention—bring your vendor or self-host plan.
ServiceMonitor examples exist, but you can complete core modules with docker-compose profiles.
Office hours are shared across cohorts; critical blockers get async Loom walkthroughs within 36 hours.
Experience notes
Histogram lab tied to the Linux cohort journalctl filters—nice cross-course continuity.
Ivy · Early-career developer · 5/5 · survey
Alertmanager routing tree exercise exposed gaps in our on-call tree—constructive discomfort.
Kenji · Support team lead