Droplet Creation Disabled
Incident Report for DigitalOcean

The Incident

At 19:06 UTC on 2017-07-11, Droplet creation was delayed after a deployment of an internal image management service. This change created high load on image replication services, causing new Droplet creates to be delayed.

When the change was reverted, we disabled new Droplet creation and re-enabled them once load on the image services had dropped. In total, 502 users were affected from delayed creates totaling 4,755.

Timeline of Events

  • 18:50 UTC - Deployment of image management service with new feature
  • 19:06 UTC - DigitalOcean CloudOps team receives alert of delayed creates
  • 19:54 UTC - Identify large number of outstanding image replications
  • 20:05 UTC - Droplet creates disabled
  • 20:17 UTC - Image management replication service stopped
  • 20:35 UTC - Invalid image replication events removed from queue
  • 20:39 UTC - Delayed creates begin processing
  • 20:40 UTC - Previous version of image management service deployed
  • 20:48 UTC - Droplet creates re-enabled

Future Measures

The change to the image replication service increased our replication factor too fast, causing the system to be overloaded while trying to achieve the desired level. Moving forward, we will introduce rate-limiting to ensure that changes like these are handled more gracefully by the system.

In addition, we are working to make sure development and testing environments more closely mirror production, so that we can accurately predict the impact of replication factor changes.

In Conclusion

We understand that the delayed creates may have caused interruption to your work or business. We apologize to all of the affected users for this incident.

Posted 3 months ago. Jul 13, 2017 - 22:15 UTC

Resolved
Our engineers have fully resolved the issue causing Droplet event processing delays. We apologize for any issues that this may have caused for you. If you're still experiencing delays at this time please open a support ticket.
Posted 3 months ago. Jul 11, 2017 - 21:30 UTC
Monitoring
We have isolated the issue causing Droplet event processing delays and events should be proceeding as normal. We have re-enabled Droplet creation and will continue to monitor events. If you're still experiencing delays we ask that you please open a support ticket.
Posted 3 months ago. Jul 11, 2017 - 20:53 UTC
Investigating
Our engineers are investigating Droplet event processing issues. During this time, Droplet creation is disabled, and Droplet create events that were already running may take longer than expected to finish.
Posted 3 months ago. Jul 11, 2017 - 20:06 UTC
This incident affected: Regions (Global) and Services (Event Processing).