Loading…
Loading…
The platform exposes aggregated metrics and SLO reports over both fixed and custom time windows. Both read from pre-computed aggregation tables (never raw logs…
The platform exposes aggregated metrics and SLO reports over both fixed and custom time windows. Both read from pre-computed aggregation tables (never raw logs at query time), so queries return quickly even across large estates and long histories.
GET /admin/v1/metrics returns aggregated metrics grouped by (environment, serviceKey, operationKey):
| Metric | Meaning |
|---|---|
| requestCount | Number of requests in the window. |
| errorCount | Number of requests that ended in error or fault. |
| p50LatencyMs / p95LatencyMs / p99LatencyMs | End-to-end latency percentiles. |
| p95ConverterOverheadMs | The 95th-percentile conversion cost on top of backend latency. |
Converter overhead isolates the cost the conversion layer itself adds — computed as
max(0, latencyMs − backendLatencyMs) — so you can tell whether a latency change comes from
your backend or from conversion. When backend latency is unavailable, overhead falls back to
total latency.
window — one of 15m, 1h, 24h, 7d, 30d, 60d.start / end — optional ISO timestamps for a custom range anywhere inside the
retained 60-day history.environment — optional, repeatable (multi-value filter).serviceKey — optional, repeatable (multi-value filter).Every windowed surface — metrics, SLO reports, runtime-log search, the per-service timeline, audit-event search — accepts the same window vocabulary:
15m 1h 24h 7d 30d 60d
…plus a custom start/end range inside the retained 60-day window. The 60-day ceiling is
the default retention horizon (configurable); see Data retention.
GET /admin/v1/slo-report produces a service-level report with targets and breach detection.
You set the targets; the platform measures against them and flags breaches per time bucket.
| Parameter | Default | Meaning |
|---|---|---|
| availabilityTarget | 99.5 | The availability percentage you commit to. |
| p95LatencyTargetMs | 500 | The p95 latency ceiling you commit to. |
| Field | Meaning |
|---|---|
| availabilityPct / availabilityTarget | Measured availability vs target. |
| p95LatencyMs / p95LatencyTargetMs | Measured p95 latency vs target. |
| availabilityBreach / p95LatencyBreach / breach | Whether each target — or either — was breached over the window. |
| breachBucketCount | How many individual time buckets breached. |
| timeline[] | Per-bucket points, each with its own breach flags, so you can see when a breach occurred, not just that it did. |
Filter the report by serviceKey, by window (or a custom start/end range), and apply the
targets above. Because the report is bucketed, a brief blip and a sustained outage are
distinguishable — both move availabilityPct, but breachBucketCount and the timeline[]
flags tell them apart.