[Beta] Whatsapp Messaging Outage
Incident Report for Nexmo
Postmortem

What happened

At 12:14 UTC on 10th October 2019, we began to have a Whatsapp outage. We failed outbound Whatsapp messages with "Undeliverable" status delivery receipts and stopped processing inbound messages so they began to queue on Whatsapp's side. By 12:54 we'd made changes which fixed this for some of our users. This issue was completely fixed at 13:34, outbound Whatsapp messages began to be delivered again, and the inbound messages that were queuing were processed and delivered.

Causes

The deletion of the legacy WhatsApp cluster management infrastructure accidentally resulted in the removal of shared permission objects. These missing permission objects caused our current WhatsApp cluster management to stop serving traffic.

Preventive Actions

We're implementing stricter processes which should prevent changes like this from causing outages in the future.

Posted 5 days ago. Oct 17, 2019 - 10:31 UTC

Resolved
This issue is now resolved. To further clarify the details, the degradation started around 12:14 UTC, and, as said before, impacted both outbound and inbound only for Whatsapp messages.

From 12:54 UTC most of our Whatsapp instances were back online and the service was fully restored for all customers at 13:33 UTC.

A post-mortem will be published in the future regarding this.
Posted 12 days ago. Oct 10, 2019 - 14:41 UTC
Monitoring
Today at 12:18 UTC, we have detected an issue causing an outage with both inbound and outbound Whatsapp messages. During this period, outbound messages would have failed with an Undeliverable status. This issue was fixed at around 13:06 UTC, and now both inbound and outbound messages should be delivered successfully. We are continuing to monitor the service.

Only WhatsApp messages were impacted and not other channels available via the Messages and Dispatch API.
Posted 12 days ago. Oct 10, 2019 - 13:37 UTC
This incident affected: [Beta] Messages API and [Beta] Dispatch API.