Loading…
Loading…
The day-2 playbook: what you check each morning, what you tune, and how to keep an eye on capacity and ingest health. Since v1.0.
The day-2 playbook: what you check each morning, what you tune, and how to keep an eye on capacity and ingest health. Since v1.0.
Customer-specific values (hostname, project, region) are shown as placeholders.
URL: https://<your-hostname>/dashboard/
The Dashboard is the first surface you check each day. It reads from pre-computed aggregation tables, not raw logs, so it loads sub-second even on estates with tens of millions of rows.
Filters: Source (f5 gateway-edge / runtime / both), Direction
(inbound / outbound / both), and Time window (1h / 24h / 7d / 30d).
KPIs (top row): Requests, Errors (count + rate), p95 latency, Active services.
Tiles: Traffic timeline (5-minute buckets), Top 5 services by traffic, Top 5 services by errors, and a virtual-server-scoped failure breakdown for gateway-edge traffic.
Drill-down: click any service row to open Service detail — per-operation traffic, error rate, latency, and a link to recent logs filterable by correlation ID.
Live traffic log: https://<your-hostname>/observability/f5-live/ (when the
install profile includes f5_modernization; for other gateway-edge sources use
Observability → Log Search, same shape). Shows individual requests in
near-real-time with correlation ID, service URI, method + status, request and
response payloads and headers, and latency.
Use it to:
The dashboards depend on a steady ingest feed. The authoritative "is data still arriving?" check is the freshness of the raw ingest table:
SELECT MAX(created_at) FROM s2r_f5_request_log;
If that timestamp is recent, traffic is flowing — regardless of any upstream zero-window alert. If it is stale, work through Troubleshooting → "ingest stopped flowing".
The platform aggregates discovered services on a ~5-minute cycle. Force a refresh from Discovery → Refresh (re-aggregates the recent window) — useful after a gateway config change to confirm new traffic is being picked up.
When a backend team publishes a new WSDL revision:
Rollback: Services → your service → Revisions → select prior → Make default.
URL: https://<your-hostname>/settings/
pristine (first boot, transient), protected
(normal RBAC), recovery (operator-bootstrap path active).f5_modernization, generic, or both. Change rarely;
existing services are unaffected.prod, staging, dr) surfaced in
audit and structured logs for filtering.URL: https://<your-hostname>/settings/updates/
The platform checks daily for available updates. The Updates tab shows the
current version, any newer available version, release notes, and an update
mode: manual (default — you approve each update) or automatic (applied
within a configured maintenance window). Manual mode is recommended for
production; the apply flow pulls images, runs migrations, rolls services one at a
time, verifies each new revision, and rolls back automatically on failure.
If an update fails to roll out, see Troubleshooting → "Update applied but service won't start".
The platform is engineered for a ~1M-calls/day baseline (≈ 12 rps sustained, ~100 rps burst). To check you have headroom:
SELECT MAX(created_at) FROM s2r_f5_request_log;).