This incident has been resolved.
Nov 19, 10:34 UTC
At 9:05 UTC, the message bus for the alerting system had a failure. On failover, absence rules were triggered for many alerts. We are actively monitoring the issue and investigating a root cause.
Nov 19, 09:39 UTC
This incident has been resolved at 13:26 UTC. Any systems in alert at time of remedy will have had a notification sent out. Graphs data is not affected.
Nov 18, 13:26 UTC
Ingestion into the alerting platform was halted for 9 minutes due to redundant failures. Absence alerts were sent out since the system could not see data. Once resolved, these alerts should have cleared, and any new alerts during that time were sent out.
Nov 18, 13:17 UTC
The scheduled maintenance has been completed.
Nov 13, 16:30 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Nov 13, 15:30 UTC
We will be upgrading the metric aggregation system at this time. Some absence alerts with short thresholds may trigger due to delays in aggregation during the upgrade, but all metrics will be cached and stored normally within a few minutes.
Nov 11, 15:01 UTC
Metric aggregation has caught up.
Nov 11, 00:50 UTC
A metric aggregation component became backlogged, and some metric from the last hour may not immediately be visible on graphs. Alerting is unaffected, and all delayed metrics will be visible in the UI shortly.
Nov 11, 00:01 UTC