| Metric | Description | Alert Threshold |
|---|---|---|
| Success Rate | Proportion of successful requests | < 99% |
| P50 Latency | Response time for 50% of requests | Varies by model |
| P99 Latency | Response time for 99% of requests | > 3x baseline |
| Error Rate | Proportion of 5xx errors | > 1% |
| Timeout Rate | Proportion of timed-out requests | > 0.5% |
| Metric | Target |
|---|---|
| Monthly Availability | 99.9% |
| Incident Response Time | Within 1 hour for critical issues |
| Incident Recovery Time | Within 30 minutes |