DigitalOcean Services Status

Spaces, Spaces CDN, DOCR, LBaaS, App Platform, Functions and Mongo Backups.
Incident Report for DigitalOcean
Postmortem

Incident Summary

On Thursday, November 28, 2024, our domain registrar, Network Solutions, made an update to our digitaloceanspaces.com domain. The change made at 22:38 UTC (5:38:40PM ET) added an Extensible Provisioning Protocol (EPP) clientHold status code to our domain.

The code prevents DNS resolution until a customer contacts Network Solutions to clear the hold. The impact of the hold resulted in DigitalOcean customers being unable to sign-up/log-in via the Cloud Control Panel at the beginning of the incident and experiencing errors with Spaces buckets and dependent services, such as DigitalOcean Container Registry, App Platform, Functions, Load Balancers, and Mongo Managed Database Backups for the duration of the incident. 

Incident Summary

Root Cause: Domain registrar erroneously placed a clientHold on the digitaloceanspaces.com domain, impacting DNS resolution and causing an outage.

Impact: Traffic to DigitalOcean products (e.g. Cloud Control Panel, Spaces, App Platform, etc.) began to see intermittent failures between 11/28 22:38 UTC and 04:03 UTC, with the worst failures happening between 03:00 and 04:03 UTC as DNS cache and TTLs began expiring.

Response: DigitalOcean escalated to the domain registrar, as well as Verisign (as the authoritative domain registry for .com domains), to have the clientHold removed. 

Timeline of Events

November 28, 2024 (UTC)

  • 22:38 - Incident declared based on monitoring and customer reports of our Cloud Control Panel not loading.
  • 22:44 -  Issue detected by an internal alert
  • 22:57 - 1st customer contact regarding Cloud Control Panel impact
  • 23:57 - We noticed that our whois information for the domain had been updated on November 28 22:38 UTC.

November 29, 2024 (UTC)

  • 01:06 - We identified the clientHold status in the EPP section of our whois for digitaloceanspaces.com and confirmed in our Network Solutions account that the domain had been marked inactive.
  • 01:08 - We initiated contact with Network Solutions to have the clientHold removed.
  • 01:30 - Reached out to Verisign executives to assistance 
  • 01:42 - Cloud Control Panel functionality restored
  • 02:40 - Conference call with Verisign executive to brief them on the issue(s) we have encountered attempting to resolve the clientHold status.
  • 02:44 - Email chain established with Verisign executives. A recap of all work done, problems, and a clear issue resolution request were sent.
  • 02:44 - 04:03 - Multiple escalations between Verisign executives leading to clientHold being removed.
  • 03:00 - As DNS caches and recursive DNS server TTLs began to expire, we saw the most severe service disruptions to our customers until resolution
  • 04:03 - clientHold removed from domain
  • 04:03 - 5:38:40 - Monitoring all infrastructure to ensure healthy and full recovery. DigitalOcean functionality fully restored.

Remediation Actions

DigitalOcean teams are working on multiple types of remediation to prevent a similar incident from happening.

DigitalOcean is working with Network Solutions to understand what happened on their end that resulted in the clientHold being applied to our domain incorrectly.

In addition, we are reviewing other domain registrars as possible new homes for our domains.

Teams are also reviewing our monitoring and alerting to reduce our time to detect incidents related to registrar imposed changes and/or DNS resolution.

Posted Dec 05, 2024 - 01:39 UTC

Resolved
Our Engineering team has confirmed the full resolution of this issue. If you continue to experience problems, please open a ticket with our support team. Thank you for being so patient, and we apologize for any inconvenience.
Posted Nov 29, 2024 - 05:35 UTC
Monitoring
Our domain registrar has taken action and the digitaloceanspaces.com domain is now recovering. We have verified the domain is resolving and are seeing error rates dropping for all impacted services. Users should begin seeing services returning to normal operation.

Our Engineering teams are monitoring the situation to ensure complete recovery and we will post a final update soon.
Posted Nov 29, 2024 - 04:24 UTC
Update
Our Engineering team has identified additional downstream services seeing elevated failure rates and broken functionality. At this time, the following services are impacted:

- Spaces object accessibility in all regions with objects hosted under digitaloceanspaces.com.

- Requests to create, push, or pull to/from Container Registries

- Backups for the Mongo Managed Databases product are not successfully uploading, so users attempting to use those backups will see errors.

- Creates and resizes for Load Balancers in all regions

- New builds for App Platform in all regions

- Create or updating Functions in all regions

We are in active contact with our vendor to resolve this as quickly as possible. Any DNS record under digitaloceanspaces.com will continue to fail until this incident is resolved.

We will continue to provide updates as we have new information.
Posted Nov 29, 2024 - 03:47 UTC
Update
Our Engineering team has identified additional downstream services seeing elevated failure rates and broken functionality. At this time, the following services are impacted:

- Spaces object accessibility in all regions with objects hosted under digitaloceanspaces.com.

- Requests to create, push, or pull to/from Container Registries

- Backups for the Mongo Managed Databases product are not successfully uploading, so users attempting to use those backups will see errors.

- Creates and resizes for Load Balancers in all regions

- New builds for the App Platform in all regions

We are in active contact with our vendor to resolve this as quickly as possible. Any DNS record under digitaloceanspaces.com will continue to fail until this incident is resolved.

We will continue to provide updates as we have new information.
Posted Nov 29, 2024 - 03:36 UTC
Update
Spaces accessibility in all regions continues to be impacted and users will experience errors when attempting to interact with objects hosted under digitaloceanspaces.com. Container Registries are also impacted and users will experience errors when attempting to create, push, or pull containers. Backups for the Mongo Managed Databases product are also not successfully uploading, so users attempting to use those backups will see errors.

We are in active contact with our vendor to resolve this as quickly as possible. Any DNS record under digitaloceanspaces.com will continue to fail until this incident is resolved.

We will continue to provide updates as we have new information.
Posted Nov 29, 2024 - 03:04 UTC
Update
Spaces accessibility in all regions continues to be impacted and users will experience errors when attempting to interact with objects hosted under digitaloceanspaces.com. Backups for the Mongo Managed Databases product are also not successfully uploading, so users attempting to use those backups will see errors.

We have engaged with our upstream vendor to resolve this incident as quickly as possible and are awaiting a further update. Any DNS record under digitaloceanspaces.com will continue to fail until this incident is resolved.

We understand these are critical issues for our customers and are working as quickly as possible to resolve this.
Posted Nov 29, 2024 - 02:38 UTC
Update
Our Engineering team has successfully deployed a solution to bring the Cloud Control Panel back online. Please refresh the page a few times to clear your cache.

However, we are still experiencing issues with Spaces. We are engaging with our upstream vendor to resolve this incident as quickly as possible. Any DNS record under digitaloceanspaces.com will continue to fail until this incident is resolved. We understand this is a critical issue for our customers and are working as quickly as possible to resolve this.
Posted Nov 29, 2024 - 01:50 UTC
Identified
Our Engineering team has identified this as a domain registrar issue for the entire digitaloceanspaces.com domain. We are engaging with our upstream vendor to resolve this incident as quickly as possible. Any DNS record under digitaloceanspaces.com will continue to fail until this incident is resolved. We understand this is a critical issue for our customers.

Our Cloud Control Panel is dependent on our Spaces product and is also experiencing an outage due to this issue.

Due to the Cloud Control Panel being down, if users need to submit a support ticket, they may do so here: https://www.digitalocean.com/company/contact/support
Posted Nov 29, 2024 - 01:31 UTC
Update
Our Engineering team is continuing to investigate this issue and all possible paths to remediation, as well as confirm if this issue is related to an upstream provider outage. We understand this is a critical issue for our customers.

Due to the Cloud Control Panel being down, if users need to submit a support ticket, they may do so here: https://www.digitalocean.com/company/contact/support
Posted Nov 29, 2024 - 01:10 UTC
Update
We are continuing to investigate this issue.
Posted Nov 29, 2024 - 00:34 UTC
Investigating
Our Engineering team is currently investigating an issue affecting portions of the Cloud Panel across all regions. Additionally, some Space Buckets may be inaccessible via the CDN. Users may experience difficulties accessing the cloud control panel, encounter DNS-related problems, or face interruptions when managing or accessing Spaces and Spaces CDN through.

We apologize for the inconvenience and will share an update once we have more information.
Posted Nov 29, 2024 - 00:21 UTC
This incident affected: App Platform (Global), Managed Databases (Global), Load Balancers (Global), Functions (Global), Spaces (Global, AMS3, FRA1, NYC3, SFO2, SFO3, SGP1, SYD1, BLR1), Container Registry (Global, AMS3, BLR1, FRA1, NYC3, SFO2, SFO3, SGP1, SYD1), Spaces CDN (Global, AMS3, FRA1, NYC3, SFO3, SGP1, SYD1), and Cloud Control Panel.