Use Case 01
PostgreSQL cluster. CPU at 62%, memory at 72%. No alerts firing. On-call team sees nothing wrong.
What's Actually Happening (Without ITOM)
Over 6 days: connection pool crept from 68% → 91%. Write latency drifted 4ms → 47ms. An analytics query held row locks. No individual metric breached a threshold.
What ITOM Does — Step by Step
- Ingests time-series data across CPU, memory, connection pool, lock wait time, and write latency simultaneously
- Computes composite health score — detects trajectory convergence toward failure even when no single metric alerts
- Identifies offending query via session analysis: Session ID 48291 holding row-level locks
- Fires predictive alert: connection pool exhaustion predicted in ~14 hours
ITOM Alert Output
> ITOM Alert: DB-PROD-CLUSTER-01
> Anomaly: Connection pool 91% (+23% over 6d)
> Write latency deviation: 3.4σ above 30d baseline
> Probable cause: Long-running query — Session ID 48291
> Estimated time to failure: ~14 hours
Without ITOM vs. With ITOM
Without ITOM: Database crashes at 2 AM. War room for 4+ hours.
With ITOM: DBA terminates query. Write latency normalizes in 30 min. No outage.
Key Metrics
- 6 days — Silent degradation
- 3.4σ — Latency deviation detected
- ~14h — Warning before crash
- 0 min — Downtime