Nexmo Dashboard Analytics and Delivery data delayed
Incident Report for Nexmo
Postmortem

What happened

From 11-10-2018 05:00 AM UTC until 11-10-2018 11:30 AM UTC aggregates data was queued, causing reporting data to be unavailable to the dashboard in the expected fashion.

Causes

An exceptionally large query overloaded the reporting database. This created a cascading impact on our data pipeline systems that severely delayed the ability to write aggregated usage data. While the delay was occurring the pending data queue could not be resolved with additional writing capacity alone. To solve this queueing issue with minimum disruption and achieve swift resolution we restarted the aggregation service, flushing the pending queue and allowing the service to return to processing data from that point on as normal. This operation was successful in reinstating real time aggregated reporting to the Dashboard immediately. We then continued with the offline process of rebuilding the affected aggregates over the next two weeks. No data was lost.

Preventive Actions

Redesigning the queries to avoid unnecessary locking on read only requests.

Implementing the use of new real time aggregates functionality to avoid this happening in future.

Posted 22 days ago. Nov 20, 2018 - 12:31 UTC

Resolved
The cause of our data delay has been resolved.

The following services were impacted from 11-10-2018 05:00 AM UTC until 11-10-2018 11:30 AM UTC.:
Dashboard Analytics reports
Dashboard Delivery reports

This data between 11-10-2018 05:00 AM UTC and 11-10-2018 11:30 AM UTC will continue to be inaccurate until we can restore it, but as of 11-10-2018 11:30 AM UTC all future data is present.

All other Dashboard reports and customer services have not been impacted.

Please refer to https://help.nexmo.com/hc/en-us/articles/360017851111-Dashboard-Analytics-and-Delivery-reports-data-delayed-11th-October-2018 for further updates on the status of the affected data.
Posted 2 months ago. Oct 11, 2018 - 17:28 UTC
Monitoring
We have identified the root cause of the delayed data and are fixing it.

Analytics and Delivery data is up-to-date and accurate from 11-10-2018 11:30 AM UTC onwards.
Analytic and Delivery data for the period 11-10-2018 05:00 AM UTC until 11-10-2018 11:30 AM UTC will continue to be delayed until it can be corrected.

We will post further updates here when we have them.
Posted 2 months ago. Oct 11, 2018 - 11:45 UTC
Investigating
Customers are experiencing problems with delayed data on the Nexmo Dashboard.

As a result the following data is delayed:
Analytics reports from 11-10-2018 05:00 AM UTC on.
Delivery reports from 11-10-2018 05:00 AM UTC on.

Customer traffic and all other reports in the Dashboard are not impacted.

We will update this status as soon as this is resolved and the delayed data is available again.
Posted 2 months ago. Oct 11, 2018 - 10:37 UTC
This incident affected: Nexmo Dashboard.