Voice API outage - 2017-09-13
Incident Report for Nexmo

WHAT HAPPENED

On September 13th 2017, between 15:31 and 16:18 UTC, Voice API requests originating from the US and EMEA regions received HTTP 500 responses. The outage also affected inbound calls.

At 16:18 UTC a failover to one of our redundant data centers was performed, so that investigative and resolution work in the US data center could continue with minimal customer impact. Voice API services were run from the redundant data center until September 14th 2017 at 14:20 UTC. During this time customers might have experienced latency and occasional API requests failures.

On September 14th 2017 at 14:20 UTC the issue was fully resolved with service re-established from the US data center.

CAUSES

Due to an increase in Voice traffic in the US data center, one of the database cluster nodes run out of available memory. During the automatic traffic rebalancing process, the Voice API performance was impacted resulting in HTTP 500 responses.

During the time when Voice API was run from the redundant data center, customers with IP whitelisting in place for Voice API callbacks, NCCO and Audio file download requests, would not have received these callbacks because the relevant IPs were not published before the incident and therefore would not be present in customer firewall rules.

CORRECTIVE ACTIONS

A program of work has been identified that includes:

  • Further increase of memory capacity allowance
  • Improvements in automated memory management
  • Review and optimization of failover process and tools
Posted 2 months ago. Sep 20, 2017 - 15:05 UTC

Resolved
This outage has now been completely resolved, as of 15:02:35. We will publish a full post-mortem shortly.
Posted 2 months ago. Sep 14, 2017 - 14:38 UTC
Update
This incident was resolved at 4:20 pm UTC.

This service is currently running from our disaster recovery location. Due to this, calls may experience additional latency.
Posted 2 months ago. Sep 13, 2017 - 19:50 UTC
Monitoring
This incident was resolved at 4:20 pm UTC.
Posted 2 months ago. Sep 13, 2017 - 16:45 UTC
Investigating
Please note that we are experiencing an outage in the US region since 15:31 UTC for the Voice API. Voice API requests are returning 500 HTTP responses and inbound calls are also affected by this outage.
Posted 2 months ago. Sep 13, 2017 - 15:47 UTC
This incident affected: Voice-related API's (Voice API, North America (US, Canada & Caribbean) traffic).