Daily Operations

The day-2 playbook: what you check each morning, what you tune, and how to keep an eye on capacity and ingest health. Since v1.0.

The day-2 playbook: what you check each morning, what you tune, and how to keep an eye on capacity and ingest health. Since v1.0.

Customer-specific values (hostname, project, region) are shown as placeholders.

1. Operations Dashboard

URL: https://<your-hostname>/dashboard/

The Dashboard is the first surface you check each day. It reads from pre-computed aggregation tables, not raw logs, so it loads sub-second even on estates with tens of millions of rows.

Filters: Source (f5 gateway-edge / runtime / both), Direction (inbound / outbound / both), and Time window (1h / 24h / 7d / 30d).

KPIs (top row): Requests, Errors (count + rate), p95 latency, Active services.

Tiles: Traffic timeline (5-minute buckets), Top 5 services by traffic, Top 5 services by errors, and a virtual-server-scoped failure breakdown for gateway-edge traffic.

Drill-down: click any service row to open Service detail — per-operation traffic, error rate, latency, and a link to recent logs filterable by correlation ID.

2. Ingest / live traffic

Live traffic log: https://<your-hostname>/observability/f5-live/ (when the install profile includes f5_modernization; for other gateway-edge sources use Observability → Log Search, same shape). Shows individual requests in near-real-time with correlation ID, service URI, method + status, request and response payloads and headers, and latency.

Use it to:

Investigate a specific consumer-reported failure (filter by correlation ID).
Spot-check that a newly onboarded service is sending real data.
Diagnose payload encoding / charset issues.

Ingest-health check

The dashboards depend on a steady ingest feed. The authoritative "is data still arriving?" check is the freshness of the raw ingest table:

SELECT MAX(created_at) FROM s2r_f5_request_log;

If that timestamp is recent, traffic is flowing — regardless of any upstream zero-window alert. If it is stale, work through Troubleshooting → "ingest stopped flowing".

3. Updating the catalog

Re-running discovery

The platform aggregates discovered services on a ~5-minute cycle. Force a refresh from Discovery → Refresh (re-aggregates the recent window) — useful after a gateway config change to confirm new traffic is being picked up.

Importing a new WSDL version

When a backend team publishes a new WSDL revision:

Services → your service → Schema → Replace WSDL, then upload or link it.
The platform diffs against the existing schema and shows added / removed / changed operations and types.
Review the diff. Existing field mappings are preserved where names match; new fields are auto-mapped; removed fields are flagged.
Publish creates a new immutable revision. Existing consumers stay on the prior revision until you mark the new one as default.

Rollback: Services → your service → Revisions → select prior → Make default.

4. Settings

URL: https://<your-hostname>/settings/

Auth mode (read-only): pristine (first boot, transient), protected (normal RBAC), recovery (operator-bootstrap path active).
Install profile: f5_modernization, generic, or both. Change rarely; existing services are unaffected.
Topology labels: free-text labels (prod, staging, dr) surfaced in audit and structured logs for filtering.
Excluded consumer IP ranges: CIDR list excluded from aggregation (e.g. internal monitors / synthetic tests) so dashboards reflect real consumer traffic.
Retention: governs how long the raw traffic archive is kept (default 60 days, configurable). See Data retention and Backup & restore.
License: current edition, status, expiry, service count vs cap, and the renewal-token upload. See Licensing.

5. Update channel

URL: https://<your-hostname>/settings/updates/

The platform checks daily for available updates. The Updates tab shows the current version, any newer available version, release notes, and an update mode: manual (default — you approve each update) or automatic (applied within a configured maintenance window). Manual mode is recommended for production; the apply flow pulls images, runs migrations, rolls services one at a time, verifies each new revision, and rolls back automatically on failure.

If an update fails to roll out, see Troubleshooting → "Update applied but service won't start".

6. Capacity check

The platform is engineered for a ~1M-calls/day baseline (≈ 12 rps sustained, ~100 rps burst). To check you have headroom:

Requests KPI over a 24h/30d window vs the baseline.
p95 latency trend — a steady climb signals a saturating backend or undersized runtime.
Runtime instances/replicas — keep a warm minimum (min-instances ≥ 1 on Cloud Run; ≥ 1 replica on Helm) to avoid cold-start latency, and scale the runtime horizontally for sustained higher throughput.

See Metrics & SLO reports.

7. Suggested daily checklist

Open the Dashboard; scan KPIs and Top 5 errors.
If anything is red: drill into the service, inspect recent failures in the live traffic log, decide remediate vs accept.
Glance at the License banner — anything expiring soon?
Glance at Updates — anything pending? If so and you are in manual mode, schedule a window.
Confirm ingest freshness (SELECT MAX(created_at) FROM s2r_f5_request_log;).
(Weekly) Review the Discovery onboarding queue for new services ready to onboard.
(Monthly) Review retention settings — the raw archive can grow large on high-traffic installs.

All Specaria SOAP to REST docs

Loading…