• All Systems Operational

    If you are experiencing any issues please open a support ticket.

    Open a ticket
  • Regions
  • Global
  • AMS1
  • AMS2
  • AMS3
  • BLR1
  • FRA1
  • LON1
  • NYC1
  • NYC2
  • NYC3
  • SFO1
  • SFO2
  • SGP1
  • TOR1
  • Services
  • API
  • Block Storage
  • Cloud Control Panel
  • Community
  • DNS
  • Droplets
  • Event Processing
  • Load Balancers
  • Networking
Status History
Dec 30 2013
Dec 29 2013
Dec 28 2013
Dec 27 2013
Dec 26 2013
Dec 25 2013
Dec 24 2013 15:52 UTC

Networking issue in AMS.

We have confirmed AMS has now normalised and ask that any further issues be opened with support.

Thank you again for your patience and wishing all our customers a happy and safe holiday!

History

We are investigating reports of a networking issue in our AMS region and will update the status as we know more.

We have temporarily disabled events in this region and will process existing events through as the region returns to normal.

investigating 2013-12-24 15:53:13 UTC

We have escalated this network issue to our provider however this regional issue may back up events in other regions while our scheduling processes. We appreciate your patience.

update 2013-12-24 16:56:16 UTC

We have confirmed AMS has now normalised and ask that any further issues be opened with support.

Thank you again for your patience and wishing all our customers a happy and safe holiday!

resolved 2013-12-24 19:09:36 UTC
Show Full Details
Dec 23 2013
Dec 22 2013
Dec 21 2013 13:56 UTC

NYC2 - Network Interruption

Juniper has confirmed that the known bug that caused the reboot of a specific line card.

INITIAL PROGNOSIS from Juniper:

All the above messages prior to the FPC reboot had come from the LU ASIC functions (MAC
lookup, next-hop, hashing), and data structure address space (zones, contexts, parcel)
used by the packet processing engine (PPE) for processing packet headers in the ASIC.
Eventually, the LUCHIP found itself in a wedged condition as the PPE was clearly having
some problem with the data structure memory space and had to reboot the FPC to correct
this condition.

There is a known software issue in the LUCHIP getting wedged when memory is not released
then timeout. There is a maintenance release coming out next week to fix that known
issue. There is also a service release available with the same fix.

NEXT ACTION DigitalOcean:

We will schedule a maintenance window in the next 1 - 2 weeks to perform the software upgrade and finalize network configuration in NYC2 to full redundancy that will match our AMS2 network.

Network services will be more resilient after the changes in our future maintenance and as always we strive for 100% uptime and monitor our network aggressively to ensure smooth service.

History

We've determined that a line card in our Juniper Core router rebooted this morning at 08:21 AM EST. The network fully recovered by 08:39 AM EST. We are currently working with Juniper to fully diagnose the cause and perform any necessary work to ensure this bug does not happen again.

In the near future we will be updating the NYC2 network configuration to match our new AMS2 network configuration where the network topology is tested and working flawlessly.

We apologize for the brief interruption and are monitoring the environment in full detail currently. Will resolve this message shortly after Juniper verifies the problem.

monitoring 2013-12-21 14:00:12 UTC

Juniper has confirmed that the known bug that caused the reboot of a specific line card.

INITIAL PROGNOSIS from Juniper:

All the above messages prior to the FPC reboot had come from the LU ASIC functions (MAC
lookup, next-hop, hashing), and data structure address space (zones, contexts, parcel)
used by the packet processing engine (PPE) for processing packet headers in the ASIC.
Eventually, the LUCHIP found itself in a wedged condition as the PPE was clearly having
some problem with the data structure memory space and had to reboot the FPC to correct
this condition.

There is a known software issue in the LUCHIP getting wedged when memory is not released
then timeout. There is a maintenance release coming out next week to fix that known
issue. There is also a service release available with the same fix.

NEXT ACTION DigitalOcean:

We will schedule a maintenance window in the next 1 - 2 weeks to perform the software upgrade and finalize network configuration in NYC2 to full redundancy that will match our AMS2 network.

Network services will be more resilient after the changes in our future maintenance and as always we strive for 100% uptime and monitor our network aggressively to ensure smooth service.

resolved 2013-12-21 20:31:17 UTC
Show Full Details
Dec 20 2013
Dec 19 2013
Dec 18 2013
Dec 17 2013
Dec 16 2013
Dec 15 2013
Dec 14 2013 07:57 UTC

SFO1 Datacenter

We have returned all peers back into service. We are still working with cisco on our equipment to ensure it's back to full service but do not expect any additional issues at this time.

History

We are investigating reports of problems in SFO1 Datacenter, we'll update as we know more

investigating 2013-12-14 07:57:32 UTC

We were able to return stability to SFO1.

We are currently working with our peers, vendors and their vendors to isolate the cause of the instability.

investigating 2013-12-14 09:18:14 UTC

We have returned all peers back into service. We are still working with cisco on our equipment to ensure it's back to full service but do not expect any additional issues at this time.

resolved 2013-12-14 11:29:56 UTC
Show Full Details
Dec 13 2013
Dec 12 2013
Dec 11 2013
Dec 10 2013
Dec 09 2013
Dec 08 2013
Dec 07 2013
Dec 06 2013
Dec 05 2013
Dec 04 2013
Dec 03 2013
Dec 02 2013 17:55 UTC

Scheduling Delay.

We have deployed code to redistribute our scheduling events and expect events to process through over the next hour, events other than creates should process now in their usual time.

History

Due to overwhelming demand in our new AMS2 Data Center, we are experiencing a 15 minute delay in events processing, we have deployed code to redistribute our scheduling and expect events to clear within the next 30 minutes.

update 2013-12-02 17:57:20 UTC

We have deployed code to redistribute our scheduling events and expect events to process through over the next hour, events other than creates should process now in their usual time.

update 2013-12-02 19:59:54 UTC

resolved 2013-12-03 10:02:07 UTC
Show Full Details
Dec 01 2013
Nov 30 2013 08:24 UTC

Black50 Promo Update

Unfortunately the Black50 promo code is now expired as it was only available till midnight eastern time on Friday (November 29th).

From time to time we do promotional offers on our social media channels. If you want to be updated on the next promotional campaign that we run we recommend that you check out our twitter account.

Thank you & Happy Holidays

History

Unfortunately the Black50 promo code is now expired as it was only available till midnight eastern time on Friday (November 29th).

From time to time we do promotional offers on our social media channels. If you want to be updated on the next promotional campaign that we run we recommend that you check out our twitter account.

Thank you & Happy Holidays

update 2013-11-30 08:25:24 UTC

resolved 2013-12-01 01:36:41 UTC
Show Full Details
Nov 29 2013
Nov 28 2013
Nov 27 2013 03:00 UTC

AMS1 Maintenance Notice

DigitalOcean will be under maintenance November 27th during which time we will be making preparations to greatly expand capacity in the Amsterdam Region.

During this time we will be preemptively disabling events processing for approximately 30 minutes while making some of the required changes. This is done to ensure a consistent state in the cloud during this process.

Because we will be making minor adjustments in the AMS1 region there may be some periods of increased latency or packetloss but it will be kept to a minimum and should not be customer impacting.

If you have any questions please open a support ticket and if you experience any issues outside of what was covered in this notice please open a support ticket to inform us.

Maintenance Window:
Date: November 27

Start Time: 3:00AM UTC
End Time: 7:00AM UTC

History

DigitalOcean will be under maintenance November 27th during which time we will be making preparations to greatly expand capacity in the Amsterdam Region.

During this time we will be preemptively disabling events processing for approximately 30 minutes while making some of the required changes. This is done to ensure a consistent state in the cloud during this process.

Because we will be making minor adjustments in the AMS1 region there may be some periods of increased latency or packetloss but it will be kept to a minimum and should not be customer impacting.

If you have any questions please open a support ticket and if you experience any issues outside of what was covered in this notice please open a support ticket to inform us.

Maintenance Window:
Date: November 27

Start Time: 3:00AM UTC
End Time: 7:00AM UTC

investigating 2013-11-27 02:47:59 UTC

resolved 2013-11-27 08:13:56 UTC
Show Full Details
Nov 26 2013 13:44 UTC

AMS1 Inbound DDoS Attack

We were able to mitigate the attack manually and are reviewing why our security system did not automatically block this traffic. At this time the network should be restored to 100% connectivity, without any increase in packet loss or latency.

Will resolve this in 1 hour if all systems are stable.

History

At this time there is a 20Gbps+ inbound denial of service attack that was not automatically mitigated by our security system. We are troubleshooting it currently to stop traffic. In the meantime customers will experience higher latency or packet loss as the attack continues.

Will update this message as soon as there is a status change to the event.

investigating 2013-11-26 13:45:42 UTC

We were able to mitigate the attack manually and are reviewing why our security system did not automatically block this traffic. At this time the network should be restored to 100% connectivity, without any increase in packet loss or latency.

Will resolve this in 1 hour if all systems are stable.

monitoring 2013-11-26 14:13:24 UTC

resolved 2013-11-26 14:49:42 UTC
Show Full Details
Nov 25 2013 02:05 UTC

NYC1 power outage.

All servers are now back online, events are re-enabled, and all control panel and API requests have been restored.

We will be updating our blog with a detailed post-mortem of this outage shortly.

History

We have engaged our engineering teams to investigate temporarily scheduling delays resulting for a power outage at our NYC1 Data Center.

investigating 2013-11-25 02:07:32 UTC

We are experiencing a power outage at NYC1 and have engaged Equinix to investigate.

investigating 2013-11-25 02:27:11 UTC

Equinix reports that UPS 7 with a redundant power supply failed, it is now back online and we are working with their engineers to restore service.

update 2013-11-25 03:40:10 UTC

All servers are now back online, events are re-enabled, and all control panel and API requests have been restored.

We will be updating our blog with a detailed post-mortem of this outage shortly.

resolved 2013-11-25 07:00:06 UTC
Show Full Details
Nov 24 2013
Nov 23 2013
Nov 22 2013
Nov 21 2013
Nov 20 2013
Nov 19 2013 17:43 UTC

Routing Issues In AMS1 Region

We have identified the network segment responsible for routing instability today and will roll out a permanent fix within a network maintenance window. The good news is that we do not expect any further network interruptions as we finalize the configuration in response to today's events.

Thank you for being patient as we finalize the network changes to restore network connectivity to 100%.

History

Our network engineers are currently investigating routing issues in the AMS1 region.

We will provide more details as soon as they become available.

Thank you for your patience.

investigating 2013-11-19 17:46:05 UTC

We are currently working with vendors to determine the exact cause and proper resolution of this issue.

As mentioned earlier, we will continue to provide updates on the situation as they become available.

investigating 2013-11-19 18:04:21 UTC

We have identified the network segment responsible for routing instability today and will roll out a permanent fix within a network maintenance window. The good news is that we do not expect any further network interruptions as we finalize the configuration in response to today's events.

Thank you for being patient as we finalize the network changes to restore network connectivity to 100%.

resolved 2013-11-20 07:46:27 UTC
Show Full Details
Nov 18 2013
Nov 17 2013
Nov 16 2013
Nov 15 2013
Nov 14 2013
Nov 13 2013 00:34 UTC

Events Scheduling Temporarily Delayed

We've cleared out the backlog of events on the backend scheduler and all event processing should now be back to normal.

History

There is currently a delay in the backend scheduler processing jobs which is causing events to be delayed in their processing.

We are investigating the issue and it should be resolved in the next 10-15 minutes at which point event processing will return back to its normal rate.

investigating 2013-11-13 00:35:20 UTC

We've cleared out the backlog of events on the backend scheduler and all event processing should now be back to normal.

resolved 2013-11-13 00:49:01 UTC
Show Full Details
Nov 12 2013
Nov 11 2013 05:00 UTC

Emergency Maint: New York 2 Internet Provider Maintenance (Nov 11 00:00-03:00 EST)

Finished, we will continue to monitor it close throughout the rest of the evening and tomorrow.

History

Start: 2013-11-11 00:00:00 EST
End: 2013-11-11 03:00:00 EST

We will be adding a new peer, NTT as well as doing repair work on our other peers.

Expected Impact:

There will be some increased latency and possible packet loss as traffic is rerouted to other peers to perform the work.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

investigating 2013-11-11 04:38:26 UTC

Changes are complete and we are closely monitoring.

monitoring 2013-11-11 07:03:53 UTC

Finished, we will continue to monitor it close throughout the rest of the evening and tomorrow.

resolved 2013-11-11 07:12:37 UTC
Show Full Details
Nov 10 2013
Nov 09 2013
Nov 08 2013
Nov 07 2013
Nov 06 2013
Nov 05 2013
Nov 04 2013
Nov 03 2013
Nov 02 2013
Nov 01 2013 05:00 UTC

NYC2 Provider Network Maintenance #2 (11/01/2013 UTC)

We have completed all of our work but will be monitoring in case action is needed.

History

Start: Friday 01 November, 05:00 UTC
End: Friday 01 November, 08:00 UTC

We will reconfigure our peer Edge routers and return NLayer back to service after repairs.

Expected Impact: 5-10 Minutes of Network Disconnect

During this maintenance we will be forced to disconnect the network for 5-10 minutes as we reconfigure the equipment to properly support the configuration. We have been working closely with Juniper to determine the best course of action that creates minimal impact on the network. During tonight's maintenance we will need to disconnect the network to perform the necessary steps.

If the maintenance goes according to plan this will restore full redundancy to the Edge routers and allow us to continue to expand capacity and provide reliable service to our customers. We will provide a full update to our original postmortem after this maintenance is complete.

We expect 1 more maintenance window during the weekend to complete all of the necessary changes and restore the NY2 network to 100% service. We understand how important network connectivity is to public cloud services and are doing absolutely everything possible to ensure that going forward our customers experience the least amount of interruptions possible.

Thank you for your patience as we finalize these changes,

DigitalOcean Networking Team

investigating 2013-11-01 04:00:28 UTC

We have completed all of our work but will be monitoring in case action is needed.

monitoring 2013-11-01 07:53:43 UTC

resolved 2013-11-01 08:53:24 UTC
Show Full Details
Oct 31 2013 02:18 UTC

NYC2 Provider Network Maintenance

We were not satisfied with the quality of the repairs and have canceled this maint.

History

Start: Wednesday 30 October, 22:00 EDT
End: Wednesday 30 October, 23:59 EDT

We will be returning our peer NLayer back into service after repairs.

Expected Impact:

We are expecting no impact although routes will be converging to use the peer and there can be some latency and packet loss as that happens.

Additionally, we have also written up a full postmortem on the networking issues that have affected the NY2 region and they can be found at our blog:
https://www.digitalocean.com/blog_posts/ny2-network-upgrade-postmortem

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

investigating 2013-10-31 02:19:31 UTC

We were not satisfied with the quality of the repairs and have canceled this maint.

resolved 2013-10-31 07:21:25 UTC
Show Full Details
Oct 30 2013
Oct 29 2013
Oct 28 2013 20:55 UTC

NYC2 Network Issues

After escalating the issue to Juniper's engineers we isolated the issue to a combination of two things that occurred yesterday. The first was an issue on nLayers network which required us to remove them from the routing mix.

Because we have redundant providers and capacity this change should have rerouted all customers that were being serviced by nLayer for connectivity on to other providers with no impact to the network.

However, after further investigation it was uncovered that jflow, which was enabled on the cores was preventing the routes from re-converging in a short enough timeframe that would not lead to any packetloss. jflow is used for monitoring traffic and ties into our DDoS prevention setup.

It appears that with a large enough network, the jflow configuration prevents the routers from adding the new routes in a normal timeframe. The recommended fix is to either decrease the sampling rate of jflow or add a hardware card, MS-DPC, which just does sampling. This offloads the workload from sampling from the core router itself on to this card, which should prevent any delay in how the routes re-converge after a network topology change.

We are now working with Juniper engineers to determine the best fit in terms of a hardware fix or a configuration change to decrease the sample rate. In the meantime, we will be making modifications to the configuration to remove jflow as the current configuration of the core routers doesn't allow the network to re-converge with zero packetloss.

History

We are investigating a networking issue that is impacting the communication between our NYC2 data center and our other regions.

Our engineers are currently working to resolve this issue.

We will provide updates as we have more details on this matter.

investigating 2013-10-28 21:00:27 UTC

After escalating the issue to Juniper's engineers we isolated the issue to a combination of two things that occurred yesterday. The first was an issue on nLayers network which required us to remove them from the routing mix.

Because we have redundant providers and capacity this change should have rerouted all customers that were being serviced by nLayer for connectivity on to other providers with no impact to the network.

However, after further investigation it was uncovered that jflow, which was enabled on the cores was preventing the routes from re-converging in a short enough timeframe that would not lead to any packetloss. jflow is used for monitoring traffic and ties into our DDoS prevention setup.

It appears that with a large enough network, the jflow configuration prevents the routers from adding the new routes in a normal timeframe. The recommended fix is to either decrease the sampling rate of jflow or add a hardware card, MS-DPC, which just does sampling. This offloads the workload from sampling from the core router itself on to this card, which should prevent any delay in how the routes re-converge after a network topology change.

We are now working with Juniper engineers to determine the best fit in terms of a hardware fix or a configuration change to decrease the sample rate. In the meantime, we will be making modifications to the configuration to remove jflow as the current configuration of the core routers doesn't allow the network to re-converge with zero packetloss.

resolved 2013-10-29 16:23:57 UTC
Show Full Details
Oct 27 2013
Oct 26 2013
Oct 25 2013 19:01 UTC

NYC2 Network Issue

Working with Juniper engineers we have tracked down the root cause of the issue to the way in which the redundancy between the two core routers was setup.

When new racks were being brought up the new top of rack switches caused an issue on the Junipers that caused high load and they were unable to keep their redundant protocols up. This was causing each router to fail intermittently and without redundancy they failed to gracefully failover to one another which resulted in packetloss for customers.

We've since stabilized the connection and there are no other planned additions to the network layer.

We are continuing our investigation in to the root cause with Juniper to diagnosis the specific cause that led the redundant protocols to tax the CPUs on the routers heavily so that we can update our configurations with a permanent solution.

History

We are investigating an issue in our NY2 region related to the new core routers.

We have escalated the issue to Junipers engineers as well.

Currently we are seeing an issue where the core routers are not converging which is causing packet loss for customers.

investigating 2013-10-25 19:02:18 UTC

Working with Juniper engineers we have tracked down the root cause of the issue to the way in which the redundancy between the two core routers was setup.

When new racks were being brought up the new top of rack switches caused an issue on the Junipers that caused high load and they were unable to keep their redundant protocols up. This was causing each router to fail intermittently and without redundancy they failed to gracefully failover to one another which resulted in packetloss for customers.

We've since stabilized the connection and there are no other planned additions to the network layer.

We are continuing our investigation in to the root cause with Juniper to diagnosis the specific cause that led the redundant protocols to tax the CPUs on the routers heavily so that we can update our configurations with a permanent solution.

resolved 2013-10-25 23:38:59 UTC
Show Full Details
Oct 24 2013
Oct 23 2013
Oct 22 2013 18:58 UTC

NYC2 Network - Intermittent Issues

We've worked with engineers at Juniper at diagnosing the issue that was present on the network. Our top of rack switches are all Ciscos as were our prior cores.

For this core upgrade we made the choice to go with Juniper at the core because it provides better configuration management tools and provides great reliability and has been increasingly taking away market-share from Cisco.

Unfortunately during the network maintenance where we were replacing the old Cisco cores with the new Juniper equipment this required a complete rebuild of the configuration, not only to get the new cores in their redundant configuration, but also to take care of all of the routing that would then go out to the top of rack Ciscos.

We ran into an issue with vendor interoperability which we diagnosed to a single top of rack switch which was causing the hypervisors and the virtual servers on that switch to not be able to establish some types of connections, which was most evident in SSL connections.

Unfortunately this configuration issue caused some latent effects on the core routers as they were unable to route some traffic however the majority of the issues were isolated to this single rack of hypervisors.

We have rolled out a new configuration for the top of rack switch as well as to the Cores and updated the router OS as well.

The issues have now been resolved and no customers should be affected.

We are continuing to review the issue that arose with Juniper engineers to ensure that there are no remaining issues on the Juniper Cores.

History

We are investigating an issue with the NYC2 network where we are seeing intermittent issues both on the public and private network. The most common issue we are seeing is the inability to establish an SSL connection or an interruption of the connection after it is established.

We have escalated the issue to Juniper as well to review how the new cores that were put into the network are handling the load.

We will provide updates as we have more information on this issue.

investigating 2013-10-22 19:00:31 UTC

Real time status updates regarding the intermittent NYC2 network issues will be provided in the DigitalOcean Public IRC Channel (#digitalocean). You can access the web based IRC interface by visiting the following URL: http://webchat.freenode.net/?channels=digitalocean&uio=d4

We will provide a formal update on our status page shortly. We apologize for any inconvenience this issue has caused.

investigating 2013-10-22 19:30:12 UTC

We've worked with engineers at Juniper at diagnosing the issue that was present on the network. Our top of rack switches are all Ciscos as were our prior cores.

For this core upgrade we made the choice to go with Juniper at the core because it provides better configuration management tools and provides great reliability and has been increasingly taking away market-share from Cisco.

Unfortunately during the network maintenance where we were replacing the old Cisco cores with the new Juniper equipment this required a complete rebuild of the configuration, not only to get the new cores in their redundant configuration, but also to take care of all of the routing that would then go out to the top of rack Ciscos.

We ran into an issue with vendor interoperability which we diagnosed to a single top of rack switch which was causing the hypervisors and the virtual servers on that switch to not be able to establish some types of connections, which was most evident in SSL connections.

Unfortunately this configuration issue caused some latent effects on the core routers as they were unable to route some traffic however the majority of the issues were isolated to this single rack of hypervisors.

We have rolled out a new configuration for the top of rack switch as well as to the Cores and updated the router OS as well.

The issues have now been resolved and no customers should be affected.

We are continuing to review the issue that arose with Juniper engineers to ensure that there are no remaining issues on the Juniper Cores.

resolved 2013-10-22 23:06:09 UTC
Show Full Details
Oct 22 2013 02:00 UTC

NYC2 Public Network Upgrade - Part II (10/21/2013 22:00EDT)

Maintenance has been completed.

History

Start: Monday 21 October, 22:00 EDT
End: Tuesday 22 October, 04:00 EDT

This is the second part of our scheduled maintenance to upgrade our core routers in the NY2 region which will provide capacity for additional peering and provide 5-6x the current routing capacity.

Expected Impact:

There will be several short periods of high latency and packet loss
lasting 1-3 minutes as providers and hyperviors are re-routed to the
new hardware. This will be spread out during the above window.

The shared private network will not be impacted, all communication
between droplets will remain normal.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

investigating 2013-10-22 01:47:51 UTC

Completed

resolved 2013-10-22 09:19:59 UTC

Maintenance has been completed.

resolved 2013-10-22 23:00:14 UTC
Show Full Details
Oct 21 2013
Oct 20 2013
Oct 19 2013 13:40 UTC

Networking Issue in NY1 Facility

After allowing time to ensure no further issues arise, we can now confirm the issue was completely resolved by disabling the affected provider.

We will work with the provider in question to reenable the uplink whenever it's shown the issue has been resolved within their infrastructure.

History

At this time, we are currently investigating an issue in our NY1 facility.

We are working to determine the exact cause/resolution of the issue and will provide details as soon as they become available.

investigating 2013-10-19 13:43:03 UTC

We have determined the cause was related to a single provider which is now disabled.

At this time, we are seeing full service restoration, but we will continue to monitor the situation and provide updates if any further issues should arise.

monitoring 2013-10-19 13:59:44 UTC

After allowing time to ensure no further issues arise, we can now confirm the issue was completely resolved by disabling the affected provider.

We will work with the provider in question to reenable the uplink whenever it's shown the issue has been resolved within their infrastructure.

resolved 2013-10-19 15:08:27 UTC
Show Full Details
Oct 18 2013 05:36 UTC

NYC2 Public Network Upgrade

Network maintenance activity is canceled and will happen again in a few days. We'll send out notification prior to it happening.

History

Start: Thursday October 17, 22:00 EDT
End: Friday October 18, 06:00 EDT

We will be proactively upgrading our peering and core routers to prepare for future features and capacity.

Expected Impact:

There will be several short periods of high latency and packet loss lasting 1-3 minutes as providers and hyperviors are re-routed to the new hardware. This will be spread out during the above window.

The shared private network will not be impacted, all communication between droplets will remain normal.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

issue 2013-10-18 05:37:59 UTC

Network maintenance activity is canceled and will happen again in a few days. We'll send out notification prior to it happening.

resolved 2013-10-18 08:46:47 UTC
Show Full Details
Oct 17 2013
Oct 16 2013 13:48 UTC

www.digitalocean.com outage

www.digitalocean.com was down starting at 3:30 am EDT until approximately 5am EDT. The cause of the outage was a high volume of activity generating too much load on the application servers. Engineering has resolved the issue and the site is functioning normally. No customer droplets were affected by this outage.

History

www.digitalocean.com was down starting at 3:30 am EDT until approximately 5am EDT. The cause of the outage was a high volume of activity generating too much load on the application servers. Engineering has resolved the issue and the site is functioning normally. No customer droplets were affected by this outage.

resolved 2013-10-16 13:49:42 UTC
Show Full Details
Oct 15 2013
Oct 14 2013
Oct 13 2013
Oct 12 2013
Oct 11 2013
Oct 10 2013
Oct 09 2013
Oct 08 2013 20:36 UTC

Scheduled Maintenance: DigitalOcean.com

We will be upgrading our website and database Wednesday (10/9/13) between the hours of 6-8pm EST. The website will be down for approximately 10 minutes. No customer virtual servers will be affected during the maintenance.

History

We will be upgrading our website and database Wednesday (10/9/13) between the hours of 6-8pm EST. The website will be down for approximately 10 minutes. No customer virtual servers will be affected during the maintenance.

update 2013-10-08 20:39:08 UTC

resolved 2013-10-14 23:51:33 UTC
Show Full Details
Oct 07 2013
Oct 06 2013
Oct 05 2013
Oct 04 2013 13:22 UTC

Delays with the Events Scheduler

We have re-enabled creates and the scheduler is now processing events again in a reasonable amount of time.

The queue is now down to normal levels, so this issue will be marked as resolved.

History

Our engineers are currently investigating delays with the events scheduler.

We will provide updates as soon as they become available.

issue 2013-10-04 13:24:24 UTC

Creates have been temporarily disabled while we work on reducing the size of the queue.

update 2013-10-04 13:26:46 UTC

We have re-enabled creates and the scheduler is now processing events again in a reasonable amount of time.

The queue is now down to normal levels, so this issue will be marked as resolved.

resolved 2013-10-04 13:57:47 UTC
Show Full Details
Oct 03 2013
Oct 02 2013 20:10 UTC

DDoS Attack on DigitalOcean.com

There is an on-going DDOS attack against digitalocean.com. No hypervisors or customer virtual servers should currently be affected by this attack.

History

There is an on-going DDOS attack against digitalocean.com. No hypervisors or customer virtual servers should currently be affected by this attack.

investigating 2013-10-02 20:11:16 UTC

resolved 2013-10-02 20:39:53 UTC
Show Full Details
Oct 02 2013 13:47 UTC

Amsterdam Capacity Replenishment

User creates have been reopened in Amsterdam for droplets up to and including 32GB. Thanks for your patience!

History

We have disabled droplet creation in Amsterdam due to diminished capacity.

We are currently working to add capacity in our Amsterdam datacenter and will be re-enabling creates as soon as possible.

There are no actions that need to be taken by customers, and no existing droplets will be affected.

issue 2013-10-02 13:50:56 UTC

User creates have been reopened in Amsterdam for droplets up to and including 32GB. Thanks for your patience!

resolved 2013-10-07 19:34:38 UTC
Show Full Details
Oct 01 2013
Sep 30 2013
Sep 29 2013
Sep 28 2013
Sep 27 2013
Sep 26 2013
Sep 25 2013
Sep 24 2013
Sep 23 2013 16:50 UTC

Slow Delivery of emails due to Gmail Issue

Currently delivery of emails from our platform are delayed due to an on-going issue on Gmail.

We use Gmail for our mail delivery service and they have confirmed that they are experiencing an issue today:
http://goo.gl/6d49Co

Once Google has resolved this issue all emails will be delivered instantly, until then emails related to droplet creates, support tickets, invoices, and so forth will unfortunately be delayed up to several hours.

History

Currently delivery of emails from our platform are delayed due to an on-going issue on Gmail.

We use Gmail for our mail delivery service and they have confirmed that they are experiencing an issue today:
http://goo.gl/6d49Co

Once Google has resolved this issue all emails will be delivered instantly, until then emails related to droplet creates, support tickets, invoices, and so forth will unfortunately be delayed up to several hours.

issue 2013-09-23 16:52:04 UTC

resolved 2013-09-24 14:28:41 UTC
Show Full Details
Sep 22 2013
Sep 21 2013
Sep 20 2013
Sep 19 2013
Sep 18 2013
Sep 17 2013 13:58 UTC

Issue with the Events Scheduler

Pending events have been processed, and creates have once again been enabled.

History

Our engineers are currently investigating an issue with the events scheduler.

We are working to ensure all existing scheduled events are processed properly, and have temporarily disabled new creates until the issue is resolved.

We will provide updates as soon as they become available.

investigating 2013-09-17 14:00:04 UTC

The issues with the scheduler appear to have been resolved.

We estimate the queue of current events should be processed in approximately 15 minutes from the time of this update.

Once the pending events have all been processed, we will re-enable suspended creates.

monitoring 2013-09-17 14:59:11 UTC

Pending events have been processed, and creates have once again been enabled.

resolved 2013-09-17 15:29:07 UTC
Show Full Details
Sep 16 2013
Sep 15 2013
Sep 14 2013 20:52 UTC

Droplet Creation Temporarily Suspended in All Regions

We are currently investigating possible issues in the droplet creation process. We are performing some system maintenance at this time and have disabled droplet creation in all regions until our maintenance is completed. We will provide further updates as they become available.

History

We are currently investigating possible issues in the droplet creation process. We are performing some system maintenance at this time and have disabled droplet creation in all regions until our maintenance is completed. We will provide further updates as they become available.

investigating 2013-09-14 20:53:09 UTC

resolved 2013-09-14 20:57:36 UTC
Show Full Details
Sep 13 2013
Sep 12 2013 07:00 UTC

Minimal Impact: New York 1 Internet Provider Maintenance

Completed.

History

To improve network performance and stability we will be disconnecting Cogent as a network provider in our New York 1 (NY1) data center. We are also adding a new connection with TeliaSonera.

Expected Impact:
We are expecting minimal impact but there can be some short periods of latency and packet loss for 1-3 minutes as inbound routes converge for the changing of providers.

Resolution:
We are constantly working to improve the performance of our network. As a result of this change we expect to see improved ping times, lower latency and better throughput.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

investigating 2013-09-12 06:54:54 UTC

Completed.

resolved 2013-09-12 08:47:49 UTC
Show Full Details
Sep 11 2013
Sep 10 2013
Sep 09 2013
Sep 08 2013
Sep 07 2013 16:21 UTC

Droplet Creation Temporarily Suspended in All Regions

We have successfully completed our maintenance, and creates have been re-enabled. Thank you for your patience.

History

We are currently investigating possible issues in the droplet creation process. We are performing some system maintenance at this time and have disabled droplet creation in all regions until our maintenance is completed. We estimate this process to take about one hour, and will provide further updates as they become available.

investigating 2013-09-07 16:24:05 UTC

We have successfully completed our maintenance, and creates have been re-enabled. Thank you for your patience.

resolved 2013-09-07 17:45:32 UTC
Show Full Details
Sep 06 2013 11:00 UTC

AMS Capacity Restrictions

We have re-enabled droplet creation in the Amsterdam region. All users will be able to create new droplets in Amsterdam at this time.

History

We have disabled droplet creation in Amsterdam. We are currently adding capacity in our Amsterdam Data Center and will be re-enabling creates as soon as possible. There are no actions that need to be taken by customers.

investigating 2013-09-06 13:08:29 UTC

We have re-enabled droplet creation in the Amsterdam region. All users will be able to create new droplets in Amsterdam at this time.

resolved 2013-09-08 21:18:02 UTC
Show Full Details
Sep 05 2013
Sep 04 2013 19:39 UTC

Network Event: NY1 Region

We have traced the network event to an issue related to a broken session between the redundant core network routers.

We are reviewing to see if any configuration changes are needed as a result of this issue.

History

We are investigating a connectivity issue that is affecting the NY1 region.

As soon as we have more information we will provide an update on the issue and the resolution.

investigating 2013-09-04 19:40:20 UTC

We have traced the network event to an issue related to a broken session between the redundant core network routers.

We are reviewing to see if any configuration changes are needed as a result of this issue.

resolved 2013-09-04 19:42:34 UTC
Show Full Details
Sep 04 2013 02:00 UTC

Minimal Impact: Amsterdam 1 Internet Provider Maintenance

Maintenance completed.

History

Maintenance window:September 4, 2013 02:00-03:00 UTC (GMT+0)

To improve network performance and stability we will be disconnecting Cogent as a network provider in our Amsterdam 1 (AMS1) datacenter. We have already added several new network providers over the past couple of months that have resulted in improved performance.

Expected Impact:
We are expecting minimal impact but there can be some short periods of latency and packet loss as inbound routes are removed from the Cogent backbone.

Resolution:
We are constantly working to improve the performance of our network. As a result of this change we expect to see improved ping times, lower latency and better throughput.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

investigating 2013-09-04 01:33:33 UTC

Maintenance completed.

resolved 2013-09-04 02:09:45 UTC
Show Full Details
Sep 03 2013
Sep 02 2013
Sep 01 2013
Aug 31 2013
Aug 30 2013
Aug 29 2013
Aug 28 2013
Aug 27 2013 04:26 UTC

DDoS Attack on DigitalOcean.com

The attack has been mitigated at this time, and the site and control panel are loading normally again.

History

There is an on-going DDOS attack against digitalocean.com. No hypervisors or customer virtual servers should currently be affected by this attack.

issue 2013-08-27 04:26:45 UTC

Service to digitalocean.com has been restored, we are continuing to monitor the situation.

monitoring 2013-08-27 04:38:29 UTC

The current DDOS attack has been mitigated and contained. We are actively monitoring the situation.

monitoring 2013-08-27 07:11:26 UTC

The attack is still ongoing, and we are still actively working on mitigation. There may still be some intermittent delays on the site at times while the attack persists. We will provide updates as they become available.

update 2013-08-27 13:34:36 UTC

The attack has been mitigated at this time, and the site and control panel are loading normally again.

resolved 2013-08-27 15:38:30 UTC
Show Full Details
Aug 26 2013
Aug 25 2013
Aug 24 2013
Aug 23 2013
Aug 22 2013
Aug 21 2013
Aug 20 2013
Aug 19 2013
Aug 18 2013
Aug 17 2013
Aug 16 2013
Aug 15 2013
Aug 14 2013
Aug 13 2013
Aug 12 2013 11:44 UTC

Event processing temporarily suspended

We have temporarily suspended processing new events while we resolve the backlog of requests. We expect this to be cleared within 15 minutes. Please standby - for now all actions in the control panel are suspended and will resume after the queue has been cleared.

History

We have temporarily suspended processing new events while we resolve the backlog of requests. We expect this to be cleared within 15 minutes. Please standby - for now all actions in the control panel are suspended and will resume after the queue has been cleared.

issue 2013-08-12 11:46:02 UTC

resolved 2013-08-12 11:54:44 UTC
Show Full Details
Aug 12 2013 09:30 UTC

DDoS attack against NY1

The NY1 datacenter received an extremely large DDoS attack that was not mitigated by our network protection layer. Engineers were immediately escalated to resolve the issue and were able to implement new filters to protect the network against the attack. The total duration was 1 hour between 09:30 UTC and 10:30 UTC.

These new filters are active in protecting the network and should the same attack resurface it will not disrupt service. All network connectivity is operating at normal levels, if you experience any problems please open a support ticket.

History

The NY1 datacenter received an extremely large DDoS attack that was not mitigated by our network protection layer. Engineers were immediately escalated to resolve the issue and were able to implement new filters to protect the network against the attack. The total duration was 1 hour between 09:30 UTC and 10:30 UTC.

These new filters are active in protecting the network and should the same attack resurface it will not disrupt service. All network connectivity is operating at normal levels, if you experience any problems please open a support ticket.

resolved 2013-08-12 12:33:19 UTC
Show Full Details
Aug 11 2013
Aug 10 2013
Aug 09 2013 07:00 UTC

Minimal Impact: Amsterdam 1 Internet Provider Maintenance

Completed.

History

Maintenance window: August 9, 2013 07:00-10:00 UTC (GMT+0)

During the above maintenance will be will adding a new tier 1 BGP peer (TeliaSonera) to the Amsterdam 1 (AMS1) datacenter.

Expected Impact:
We are expecting minimal impact but there can be some short periods of latency and packet loss as routing is updated to include TeliaSonera.

Resolution:
We are constantly working to improve the performance of our network and this additional network peer should improve performance for many customers.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

update 2013-08-09 06:55:12 UTC

Completed.

resolved 2013-08-09 08:08:38 UTC
Show Full Details
Aug 08 2013
Aug 07 2013
Aug 06 2013 17:37 UTC

DDOS Attack on DigitalOcean.com

There is an on-going DDOS attack against digitalocean.com, no hypervisors or customer virtual servers should currently be affected by this attack. Some customers may still have limited connectivity to both the website and the API.

History

There is an on-going DDOS attack against digitalocean.com, no hypervisors or customer virtual servers should currently be affected by this attack. Some customers may still have limited connectivity to both the website and the API.

investigating 2013-08-06 17:38:22 UTC

resolved 2013-08-06 18:00:22 UTC
Show Full Details
Aug 05 2013 20:19 UTC

All Regions: Droplet Creation Suspended

We have resolved the issue and have re-enabled creates. We are monitoring the situation closely, please report any issues via the ticketing system.

History

We have temporarily disabled droplet creation across all regions. There are no actions that need to be taken by customers. Droplet creation will be re-enabled in all regions shortly.

investigating 2013-08-05 20:20:23 UTC

We have resolved the issue and have re-enabled creates. We are monitoring the situation closely, please report any issues via the ticketing system.

update 2013-08-05 21:14:29 UTC

resolved 2013-08-05 21:32:05 UTC
Show Full Details
Aug 04 2013
Aug 03 2013
Aug 02 2013
Aug 01 2013
Jul 31 2013
Jul 30 2013
Jul 29 2013
Jul 28 2013 07:00 UTC

Reminder: AMS1 Planned Network Provider Maintenance 07/28/2013

Nlayer has informed us they have completed the work and routing has been restored.

History

Start: 2013-07-28 07:00:00 GMT
End: 2013-07-28 09:00:00 GMT

During the above maintenance window one of our providers will be performing router upgrades and their circuit will be down during the process. We will be gracefully re-routing traffic off this circuit to our other providers.

Expected impact:

Brief high latency as we suspend the use of the provider at the start of the window. Additional latency may happen near the end of the window as we return it to service after their upgrade.

monitoring 2013-07-28 06:45:03 UTC

Nlayer has informed us they have completed the work and routing has been restored.

resolved 2013-07-28 07:54:21 UTC
Show Full Details
Jul 27 2013
Jul 26 2013
Jul 25 2013
Jul 24 2013
Jul 23 2013 23:06 UTC

Minimal Impact: San Francisco 1 Internet Provider Maintenance (1 of 2) July 24 00:00 - 03:00 PDT (GMT -7)

Maintenance has been completed.

History

Maintenance window: July 24, 2013 00:00 - 03:00 PDT (GMT-7)

During the above maintenance will be will adding a new tier 1 BGP peer (TeliaSonera) to the San Francisco 1 (SFO1) datacenter.

Expected Impact:
We are expecting minimal impact but there can be some short periods of latency and packet loss as routing is updated to include TeliaSonera.

Resolution:
We are constantly working to improve the stability of our network and our top priority is to stabilize the San Francisco region. Adding this additional network peer should improve performance and reliability.

Upcoming Part 2 Maintenance:
We will be removing a legacy BGP peer from the network over the weekend where we expect minimal impact once again and will send out a notification shortly with the expected timeframe.

If you have any questions, please do not hesitate to contact support.

Thanks,
DigitalOcean Networking Team

update 2013-07-23 23:07:32 UTC

Maintenance has been completed.

resolved 2013-07-24 08:56:57 UTC
Show Full Details
Jul 22 2013
Jul 21 2013 19:15 UTC

Peering Outage in SFO1

At this time, everyone should be back to normal. If anyone is still experiencing any problems, please open a support ticket with a ping/traceroute/mtr to your droplet.

We will continue to monitor the datacenter closely to ensure there is no additional impact.

History

We saw some brief outages for one of our peers in SFO1.

While BGP converged on the backbones to reroute the inbound traffic from nlayer to our other peers, you might have experienced a short outage. This is standard bgp behavior and how the internet reroutes around outages.

resolved 2013-07-21 19:20:01 UTC

Upon further investigation is also appears nlayer might have a larger problem on the west coast. We have taken the circuit out of service and have a ticket opened with them to investigate.

investigating 2013-07-21 19:50:34 UTC

At this time, everyone should be back to normal. If anyone is still experiencing any problems, please open a support ticket with a ping/traceroute/mtr to your droplet.

We will continue to monitor the datacenter closely to ensure there is no additional impact.

monitoring 2013-07-21 20:42:36 UTC
Show Full Details
Jul 20 2013
Jul 19 2013
Jul 18 2013 20:03 UTC

DDOS Attack on DigitalOcean.com

All connectivity to both digitalocean.com and the API should be back to normal.

History

There is an on-going DDOS attack against digitalocean.com, no hypervisors or customer virtual servers should currently be affected by this attack.

At this time the website and API are currently unavailable as we are working on mitigating the attack to restore the availability of both services.

Some customers may still have limited connectivity to both the website and the API.

investigating 2013-07-18 20:03:54 UTC

All connectivity to both digitalocean.com and the API should be back to normal.

resolved 2013-07-18 20:37:35 UTC
Show Full Details
Jul 17 2013 19:42 UTC

DigitalOcean.com website and API Issue

We've traced the issue and corrected it, all of the web services and API should be back to normal now.

History

We are investigating an issue with the DigitalOcean website and API service.

Currently both are unavailable and we are investigating the issue to restore service as soon as possible to both.

No customer virtual servers should be affected as a result of this issue.

investigating 2013-07-17 19:43:42 UTC

We've traced the issue and corrected it, all of the web services and API should be back to normal now.

resolved 2013-07-17 19:47:13 UTC
Show Full Details
Jul 16 2013 02:00 UTC

Emergency Network Maintenance in NYC1 and AMS1

AMS1 is completed.

We are seeing a major improvement already. Please don't hesitate to contact support if needed.

History

Maintenance Window: July 15, 2013 22:00 - July 16 02:00 EDT (GMT-4)

We will be performing a network maintenance on the core routers in both locations. We've been working with our network vendors regarding the recent instability that was experienced and we have isolated the issue and a fix has been proposed which requires rebooting the core routers with a new network configuration which should completely resolve the issues. The maintenance will first be done in the NYC1 region and then in the AMS1 region.

Impact:

During the above maintenance period there maybe several periods of packet loss or increased latency lasting approximately 5-10 minutes per incident as the network configuration reconverges.

issue 2013-07-16 00:26:52 UTC

NYC1 has been completed. AMS1 will start shortly.

investigating 2013-07-16 02:48:17 UTC

AMS1 is completed.

We are seeing a major improvement already. Please don't hesitate to contact support if needed.

resolved 2013-07-16 03:29:55 UTC
Show Full Details
Jul 15 2013
Jul 14 2013 23:59 UTC

Scheduled Maintenance: Website and API.

Maintenance has been completed.

History

We will be upgrading our website and database. The website will be down for approximately 30-120 minutes.

update 2013-07-15 00:14:13 UTC

Maintenance has been completed.

resolved 2013-07-15 01:15:06 UTC
Show Full Details
Jul 13 2013
Jul 12 2013
Jul 11 2013 02:50 UTC

Investigating NYC1 connectivity reports.

All NY1 connectivity issues have been resolved.

History

We are investigating reports of some ip ranges unreachable from the internet. It looks to be isolated to a very small ip range with one of our providers.

investigating 2013-07-11 02:52:49 UTC

Our peers are working on fixing their backbone issue impacting approximately 200 droplets in NYC1. We hope to have the issue repaired within the hour.

issue 2013-07-11 03:24:50 UTC

All NY1 connectivity issues have been resolved.

resolved 2013-07-11 03:42:41 UTC
Show Full Details
Jul 10 2013 15:00 UTC

Droplet Creation Temporarily Suspended

IP addresses have been successfully added to all regions and Droplet creation has been re-enabled.

History

We are currently under maintenance while we are adding additional IP addresses to all regions. In the meantime, we have temporarily suspended Droplet creation. Droplet creation will be back online within 1-2 hours. We apologize for the inconvenience.

issue 2013-07-10 15:04:47 UTC

IP addresses have been successfully added to all regions and Droplet creation has been re-enabled.

resolved 2013-07-10 16:28:37 UTC
Show Full Details
Jul 09 2013 19:59 UTC

DDOS Attack on DigitalOcean.com

All connectivity to both digitalocean.com and the API should be back to normal.

We will be continuing to review this specific attack and making adjustments as necessary to our core infrastructure to make it more resilient in the future.

History

There is an on-going DDOS attack against digitalocean.com, no hypervisors or customer virtual servers are currently affected by this attack.

At this time the website and API are currently unavailable as we are working on mitigating the attack to restore the availability of both services.

issue 2013-07-09 20:01:42 UTC

We've rolled out a few changes which should help in mitigating the attack and some customers should once again be able to access the website as well as the API.

We are continuing our efforts to completely resolve this issue and also reviewing the vector that was used for the attack to improve the resilience of our systems moving forward.

update 2013-07-09 20:43:10 UTC

resolved 2013-07-09 22:01:54 UTC

All connectivity to both digitalocean.com and the API should be back to normal.

We will be continuing to review this specific attack and making adjustments as necessary to our core infrastructure to make it more resilient in the future.

resolved 2013-07-09 22:13:55 UTC
Show Full Details
Jul 08 2013
Jul 07 2013
Jul 06 2013 21:00 UTC

SFO - Disabled for new servers

This has been resolved.

History

Our Engineers are continuing to investigate major event delays in the SFO Datacenter.

We appreciate your ongoing patience with this matter, and will have further updates for you shortly.

investigating 2013-07-07 10:45:31 UTC

To prevent an increasing event backlog, we have temporarily disabled new droplet creation in the SFO Datacenter.

Further updates from our Engineers soon. We appreciate your continued patience.

update 2013-07-07 12:11:25 UTC

This has been resolved.

investigating 2013-07-07 17:17:59 UTC

resolved 2013-07-07 17:18:21 UTC
Show Full Details
Jul 05 2013
Jul 04 2013
Jul 03 2013
Jul 02 2013 04:52 UTC

Event delays in SFO

There were issues related to a ddos that have all been now resolved.

History

We are working to resolve the issue.

investigating 2013-07-02 04:53:39 UTC

There were issues related to a ddos that have all been now resolved.

resolved 2013-07-02 09:28:15 UTC
Show Full Details
Jul 01 2013 06:15 UTC

SFO - Large DDOS

At approximately 23:05 PDT (2013-06-30) there was a large DDOS in the SF1 data center. Most customers would have experienced short bursts of high latency and minor packet loss as this traffic was blocked and removed from our peering backbones.

History

At approximately 23:05 PDT (2013-06-30) there was a large DDOS in the SF1 data center. Most customers would have experienced short bursts of high latency and minor packet loss as this traffic was blocked and removed from our peering backbones.

resolved 2013-07-01 07:09:29 UTC
Show Full Details
Jun 30 2013
Jun 29 2013 10:45 UTC

SFO - Large DDOS

At approximately 03:45 PDT there was a large DDOS in the SF1 data center. Most customers would have experienced short bursts of high latency and minor packet loss as this traffic was blocked and removed from our peering backbones.

History

At approximately 03:45 PDT there was a large DDOS in the SF1 data center. Most customers would have experienced short bursts of high latency and minor packet loss as this traffic was blocked and removed from our peering backbones.

resolved 2013-06-29 11:22:59 UTC
Show Full Details
Jun 28 2013
Jun 27 2013 22:00 UTC

New York 1: Droplet Creation Suspended

Our Engineering Team has just reported that the investigation has been concluded, and the drives thought to be defective have passed all health checks at this time.

We appreciate your ongoing patience, and if you have any questions, please feel free to reach out to your Support Team for further assistance by submitting a Ticket.

--Russell Mitchell

History

Our Engineers are currently investigating what we believe to be a large batch of defective hard drives recently deployed to the NBG Datacenter. To prevent any complications or possible dataloss for new droplets, we have temporarily suspended new droplet deployments to the NBG Datacenter.

investigating 2013-06-27 22:06:46 UTC

Our Engineering Team has just reported that the investigation has been concluded, and the drives thought to be defective have passed all health checks at this time.

We appreciate your ongoing patience, and if you have any questions, please feel free to reach out to your Support Team for further assistance by submitting a Ticket.

--Russell Mitchell

resolved 2013-06-27 23:06:01 UTC
Show Full Details
Jun 27 2013 15:47 UTC

SFO1: Large DDOS

At approximately 8:15 PDT there was a large DDOS in the SF1 data center. Most customers would have experienced short bursts of high latency and minor packet loss as this traffic was blocked and removed from our peering backbones.

History

At approximately 8:15 PDT there was a large DDOS in the SF1 data center. Most customers would have experienced short bursts of high latency and minor packet loss as this traffic was blocked and removed from our peering backbones.

resolved 2013-06-27 15:48:53 UTC
Show Full Details
Jun 26 2013
Jun 25 2013 23:57 UTC

San Francisco Network Resolution

We experienced 2 network outages today at roughly 7:50AM EST and 8:15PM EST. These outages were related to a bug in the Cisco routing hardware that we were using.

The issue was escalated to Cisco technical support but they were unable to find a root cause and ultimately resolve the issue. This problem has been recurring since June 1, 2013.

RESOLUTION: We have just completed an emergency network maintenance and have re-enabled 2 core routers, full redundancy and deployed and entirely new network configuration. This configuration removes a previous feature which we suspect was the cause of the network outages over the course of this month.

We are continuing to monitor the network over night and are committed to ensuring that we deliver 100% network uptime going forward. Barring any major incidents overnight we should be in a permanent configuration and provide stable service moving forward.

History

We experienced 2 network outages today at roughly 7:50AM EST and 8:15PM EST. These outages were related to a bug in the Cisco routing hardware that we were using.

The issue was escalated to Cisco technical support but they were unable to find a root cause and ultimately resolve the issue. This problem has been recurring since June 1, 2013.

RESOLUTION: We have just completed an emergency network maintenance and have re-enabled 2 core routers, full redundancy and deployed and entirely new network configuration. This configuration removes a previous feature which we suspect was the cause of the network outages over the course of this month.

We are continuing to monitor the network over night and are committed to ensuring that we deliver 100% network uptime going forward. Barring any major incidents overnight we should be in a permanent configuration and provide stable service moving forward.

resolved 2013-06-25 23:58:08 UTC
Show Full Details
Jun 24 2013 03:33 UTC

SFO Maintenance

We identified and replaced a hardware faulty core router with a new factory unit. Within the next couple of days we will add additional network capacity to complete the San Francisco core network upgrade. At that point network service should be restored to 100% with improved latency and hops.

History

We are working on the core routers and provider uplinks in our San Francisco data center for the next hour. There may be temporary packet loss. We are aware of the problem and are working to resolve it as quickly as possible.

investigating 2013-06-24 03:35:52 UTC

We identified and replaced a hardware faulty core router with a new factory unit. Within the next couple of days we will add additional network capacity to complete the San Francisco core network upgrade. At that point network service should be restored to 100% with improved latency and hops.

resolved 2013-06-24 04:38:32 UTC
Show Full Details
Jun 23 2013
Jun 22 2013
Jun 21 2013 13:37 UTC

Amsterdam Capacity Issue

We have enabled droplet creation in Amsterdam again.

History

We have disabled droplet creation in Amsterdam. We are currently adding capacity in our Amsterdam Data Center and will be re-enabling creates as soon as possible.

There are no actions that need to be taken by customers.

issue 2013-06-21 13:40:42 UTC

We have enabled droplet creation in Amsterdam again.

resolved 2013-06-21 15:45:50 UTC
Show Full Details
Jun 20 2013
Jun 19 2013
Jun 18 2013
Jun 17 2013 21:17 UTC

Networking Issue Affecting Hypervisors

The code change has been deployed to all hypervisors and all virtual machines should once again have network connectivity. We will be doing a full review of the issue after we've had a chance to respond to all customer inquiries and providing more information on this issue.

History

There is currently a networking issue that is affecting some hypervisors and causing connectivity issues for virtual machines.

We have temporarily disabled all event processing and are working on a fix for this issue. Current estimate is that it should be resolved within the next 10-15 minutes.

There are no actions that need to be taken by customers and there is no need to power cycle your virtual server.

issue 2013-06-17 21:19:03 UTC

We've deployed a code update to the affected hypervisors and the connectivity to virtual servers should be restored without the need for any customer intervention.

The update is rolling out to all hypervisors and should correct the issue for any VMs that are still currently without network connectivity.

update 2013-06-17 21:34:00 UTC

The code change has been deployed to all hypervisors and all virtual machines should once again have network connectivity. We will be doing a full review of the issue after we've had a chance to respond to all customer inquiries and providing more information on this issue.

resolved 2013-06-17 21:37:18 UTC
Show Full Details
Jun 16 2013
Jun 15 2013
Jun 14 2013
Jun 13 2013
Jun 12 2013
Jun 11 2013
Jun 10 2013
Jun 09 2013
Jun 08 2013
Jun 07 2013
Jun 06 2013
Jun 05 2013
Jun 04 2013
Jun 03 2013
Jun 02 2013
Jun 01 2013 03:48 UTC

SF1 Region Networking Issue

After further troubleshooting it looks like the issue was the result of a bad BGP session with one of our upstream providers. BGP is an edge protocol that ISPs use to announce routes. These sessions are setup between us and our providers and effectively all of the the other providers on the entire internet for the entire public IP range. One of the BGP sessions deteriorated which caused our core routers to misbehave, troubleshooting the situation we had to determine that it wasn't the fault of our equipment and we swapped between the routers and also between our neighbors on our BGP sessions. We also updated our core routers to ignore certain malformed BGP sessions which can be problematic as they can cause a core router to overload.

We've been monitoring the status of the SF region for the past hour and have not observed any new instabilities and it looks like the issue has been resolved. We will continue to monitor the region to ensure that there are no further issues.

History

At approximately 9:45PM EST we observed a networking issue in our San Francisco region. The issue was related to the core routers and their BGP sessions which were failing and being reset. This could have been caused by bad packets from one of our providers with whom we maintain BGP sessions. BGP is how our routes are advertised on the internet and if there is an issue it effectively makes the network appear as if it is down. We have escalated the matter to Cisco which is the hardware vendor of our core routers as well as to our network providers to see if there is an issue on their end which could possibly be causing the core routers to misbehave.

At this time we've been able to restore the sessions and connectivity but we are still observing intermittent issues and we are continuing the troubleshooting of this matter.

issue 2013-06-01 03:48:00 UTC

The BGP sessions have been restored but we are still troubleshooting the issues.

update 2013-06-01 04:27:00 UTC

After further troubleshooting it looks like the issue was the result of a bad BGP session with one of our upstream providers. BGP is an edge protocol that ISPs use to announce routes. These sessions are setup between us and our providers and effectively all of the the other providers on the entire internet for the entire public IP range. One of the BGP sessions deteriorated which caused our core routers to misbehave, troubleshooting the situation we had to determine that it wasn't the fault of our equipment and we swapped between the routers and also between our neighbors on our BGP sessions. We also updated our core routers to ignore certain malformed BGP sessions which can be problematic as they can cause a core router to overload.

We've been monitoring the status of the SF region for the past hour and have not observed any new instabilities and it looks like the issue has been resolved. We will continue to monitor the region to ensure that there are no further issues.

resolved 2013-06-01 04:47:00 UTC
Show Full Details
May 31 2013
May 30 2013
May 29 2013
May 28 2013
May 27 2013
May 26 2013
May 25 2013
May 24 2013
May 23 2013
May 22 2013
May 21 2013
May 20 2013
May 19 2013
May 18 2013
May 17 2013
May 16 2013
May 15 2013
May 14 2013
May 13 2013
May 12 2013
May 11 2013
May 10 2013
May 09 2013
May 08 2013
May 07 2013
May 06 2013
May 05 2013 19:43 UTC

NY1 Region Networking Issue

Resolution

The issue was caused by a networking change which caused several switches to no longer respond correctly, requiring them to drop off from the core network. This required the affected switches to be rebooted and re-converge on to the network.

Impact

Because the switches fell off from the core public network, this caused a loss of connectivity for the customers affected and was resolved when we were able to reboot the switches and re-converge them back on to the core network. All switches had re-converged back on to the network by 1:05PM EST. The issue lasted approximately 22 minutes.

Next Steps

Due to this issue that we experienced, we are going to be replacing the top of rack switches in the NY1 region. We will be working with Cisco to bring in new equipment and coordinating maintenances with all customers during off peak hours to move all affected hypervisors on to the new switching equipment. Our ETA is to have this transition completed within the next 45 days. At this time we do not believe that the specifics of the hardware involved are the cause of the issue that we experienced. We will be pro-actively taking the necessary steps nonetheless.

History

At approximately 12:43PM EST there was a networking issue which caused connectivity to be lost for virtual servers in the NY region. The issue affected approximately 25% of customers in the region. Our technical support staff immediately escalated the matter to both our network engineers and datacenter technical staff. Our staff was on location at the time to investigate and remedy the issue as quickly as possible.

issue 2013-05-05 19:43:00 UTC

Resolution

The issue was caused by a networking change which caused several switches to no longer respond correctly, requiring them to drop off from the core network. This required the affected switches to be rebooted and re-converge on to the network.

Impact

Because the switches fell off from the core public network, this caused a loss of connectivity for the customers affected and was resolved when we were able to reboot the switches and re-converge them back on to the core network. All switches had re-converged back on to the network by 1:05PM EST. The issue lasted approximately 22 minutes.

Next Steps

Due to this issue that we experienced, we are going to be replacing the top of rack switches in the NY1 region. We will be working with Cisco to bring in new equipment and coordinating maintenances with all customers during off peak hours to move all affected hypervisors on to the new switching equipment. Our ETA is to have this transition completed within the next 45 days. At this time we do not believe that the specifics of the hardware involved are the cause of the issue that we experienced. We will be pro-actively taking the necessary steps nonetheless.

resolved 2013-05-05 20:05:00 UTC
Show Full Details
May 04 2013
May 03 2013
May 02 2013
May 01 2013
Apr 30 2013
Apr 29 2013
Apr 28 2013
Apr 27 2013
Apr 26 2013
Apr 25 2013
Apr 24 2013
Apr 23 2013
Apr 22 2013
Apr 21 2013
Apr 20 2013
Apr 19 2013
Apr 18 2013
Apr 17 2013
Apr 16 2013
Apr 15 2013
Apr 14 2013
Apr 13 2013
Apr 12 2013
Apr 11 2013
Apr 10 2013
Apr 09 2013
Apr 08 2013 23:07 UTC

NY1 Region Networking Issue

At approximately 3:05PM EST we experienced an issue on our core network in the NY1 Region, which occurred while we were performing normal networking configuration updates. The issue affected approximately 18% of customer virtual machines in the NY1 region.

Resolution

We immediately reversed the change to return the network to it's original networking configuration and then tracked down the issue to the redundant connection between the two core routers. The update to the network caused the redundant link to go down and both routers to improperly route traffic which was what affected network connectivity for some customers.

The redundant link needed to be torn down and reset in order for the core routers to once again re-establish communication and route traffic accordingly.

Impact

Because the issue occurred on the core network it resulted in a loss of connectivity for affected customers. Our troubleshooting and resolution took 20 minutes which was the duration of the downtime for the affected customers.

Vendor Escalation

Given that it was a regular networking update and not a scheduled maintenance that caused the issue, we have escalated this to our network vendor for review to see if there is potentially a bug in the current version of the OS we are running and also to further troubleshoot why this simple change caused the issue to rapidly escalate and breakdown the routing fabric.

We are providing them the necessary logs for review and if a core network upgrade is required, we will open up a maintenance window to process those changes. Given that there are two core routers in a redundant setup the upgrade, should it be necessary, will not have any customer facing impact.

History

At approximately 3:05PM EST we experienced an issue on our core network in the NY1 Region, which occurred while we were performing normal networking configuration updates. The issue affected approximately 18% of customer virtual machines in the NY1 region.

Resolution

We immediately reversed the change to return the network to it's original networking configuration and then tracked down the issue to the redundant connection between the two core routers. The update to the network caused the redundant link to go down and both routers to improperly route traffic which was what affected network connectivity for some customers.

The redundant link needed to be torn down and reset in order for the core routers to once again re-establish communication and route traffic accordingly.

Impact

Because the issue occurred on the core network it resulted in a loss of connectivity for affected customers. Our troubleshooting and resolution took 20 minutes which was the duration of the downtime for the affected customers.

Vendor Escalation

Given that it was a regular networking update and not a scheduled maintenance that caused the issue, we have escalated this to our network vendor for review to see if there is potentially a bug in the current version of the OS we are running and also to further troubleshoot why this simple change caused the issue to rapidly escalate and breakdown the routing fabric.

We are providing them the necessary logs for review and if a core network upgrade is required, we will open up a maintenance window to process those changes. Given that there are two core routers in a redundant setup the upgrade, should it be necessary, will not have any customer facing impact.

resolved 2013-04-08 23:31:00 UTC
Show Full Details
Apr 07 2013
Apr 06 2013
Apr 05 2013
Apr 04 2013
Apr 03 2013
Apr 02 2013
Apr 01 2013
Mar 31 2013
Mar 30 2013
Mar 29 2013
Mar 28 2013
Mar 27 2013
Mar 26 2013
Mar 25 2013
Mar 24 2013
Mar 23 2013
Mar 22 2013
Mar 21 2013
Mar 20 2013
Mar 19 2013
Mar 18 2013 22:04 UTC

NY1 Snapshot Issue & Resolution

Snapshot and Backup Roadmap


1. Offsite Snapshots and Backups Storage on Amazon Glacier


We've already begun the initial framework for storing all snapshots and backups on Amazon Glacier to ensure that there is a copy of each snapshot and backup that is stored on another provider. This provides an added layer of redundancy that is completely outside of DigitalOcean's network, ensuring that a single failure will not lead to any dataloss for customers.

Snapshots and backups will begin syncing to Amazon Glacier on Tuesday and we will provide an update when the sync of all existing snapshots and backups are complete and all new snapshots and backups will be automatically synced starting Wednesday.

This will allow us to always pull snapshots and backups out of Glacier and make them available for customers if for any reason one of our NAS systems experience an issue.

2. Snapshot and Backup Downloads


Customers have already been requesting that we provide a way for them to directly download their snapshots and backups so they can store them locally and allow for data portability. We will be implementing this as soon as possible. While this may seem like a trivial item to add, the complexity in rolling out this feature is to ensure security in the rollout of this feature. Because data will need to be made available from the backend NAS systems which are completely off of the public network currently.

3. Communication


As a startup often times we develop rapidly and there are a lot of internal conversations that we have about the merits of particular features and their development. We try to push this communication back to customers through our UserVoice forum but we also need to curate this conversation better to keep customers better informed of our overall product roadmap. To that end we will be using our blog and updating more frequently not only with feature announcements but also development updates to let everyone know what planned changes we are currently working on.

4. Backup Pricing


Part of being a startup is sometimes admitting when you've made a mistake, correcting it, and improving the overall service having learned from those issues. When we initially launched we planned to offer pricing for bandwidth, snapshots, and backups, however we simply were more focused on developing the core functionality than introducing those pricing guidelines. This is a mistake that we ran into with Bandwidth as well and we are looking to correct it now. Our official pricing for backups will be 20% of the cost of the virtual server. So if you want to enable backups for a $5/mo virtual server, the cost for backups will be $1/mo.

The main reason we are introducing pricing for backups and snapshots is to ensure that we can build out a robust backend storage solution as well as pay for the costs for off-site backups which are $0.01 per GB and thus ensure customer data is safe and redundant with multiple providers at all times.

These pricing changes will go into effect June 1st, giving customers two months to adjust any of their service selections accordingly, and the first time they will see an invoice item for backups will be on the July 1st invoice.

5. Snapshot Pricing


Accordingly we will also be introducing pricing for snapshots to ensure that we can provide the level of service that customers expect including data redundancy and offsite backups as well. The price for snapshots will be $0.02 per GB of snapshot storage. These rates will also go into effect as of June 1st, also giving customers two months to adjust their snapshot usage as they like without incurring any fees in the interval.

History

On Sunday March 17 at 3:07PM EST one of our backup and snapshot servers suffered a hard RAID failure causing the loss of some user backups and snapshots. We tried to recover the affected files but were unable to successfully restore them from the failed device. This resulted in the loss of specific backups and snapshots for certain users. We have emailed all affected users the specific snapshots and or backups that are no longer available.

When we originally built the snapshot/backup system our cloud was still in private beta and only a single copy of each snapshot was stored. Overtime we improved the system when customers spun up servers in different regions from a snapshot that snapshot would get copied to a secondary NAS in a different geographic region.

Looking back, we've learned that we need to be more diligent in promoting new features and services out of our labs into production and ensure that there could be no data loss or customer impact while accounting for multiple levels of failure.

issue 2013-03-18 22:04:00 UTC

Snapshot and Backup Roadmap


1. Offsite Snapshots and Backups Storage on Amazon Glacier


We've already begun the initial framework for storing all snapshots and backups on Amazon Glacier to ensure that there is a copy of each snapshot and backup that is stored on another provider. This provides an added layer of redundancy that is completely outside of DigitalOcean's network, ensuring that a single failure will not lead to any dataloss for customers.

Snapshots and backups will begin syncing to Amazon Glacier on Tuesday and we will provide an update when the sync of all existing snapshots and backups are complete and all new snapshots and backups will be automatically synced starting Wednesday.

This will allow us to always pull snapshots and backups out of Glacier and make them available for customers if for any reason one of our NAS systems experience an issue.

2. Snapshot and Backup Downloads


Customers have already been requesting that we provide a way for them to directly download their snapshots and backups so they can store them locally and allow for data portability. We will be implementing this as soon as possible. While this may seem like a trivial item to add, the complexity in rolling out this feature is to ensure security in the rollout of this feature. Because data will need to be made available from the backend NAS systems which are completely off of the public network currently.

3. Communication


As a startup often times we develop rapidly and there are a lot of internal conversations that we have about the merits of particular features and their development. We try to push this communication back to customers through our UserVoice forum but we also need to curate this conversation better to keep customers better informed of our overall product roadmap. To that end we will be using our blog and updating more frequently not only with feature announcements but also development updates to let everyone know what planned changes we are currently working on.

4. Backup Pricing


Part of being a startup is sometimes admitting when you've made a mistake, correcting it, and improving the overall service having learned from those issues. When we initially launched we planned to offer pricing for bandwidth, snapshots, and backups, however we simply were more focused on developing the core functionality than introducing those pricing guidelines. This is a mistake that we ran into with Bandwidth as well and we are looking to correct it now. Our official pricing for backups will be 20% of the cost of the virtual server. So if you want to enable backups for a $5/mo virtual server, the cost for backups will be $1/mo.

The main reason we are introducing pricing for backups and snapshots is to ensure that we can build out a robust backend storage solution as well as pay for the costs for off-site backups which are $0.01 per GB and thus ensure customer data is safe and redundant with multiple providers at all times.

These pricing changes will go into effect June 1st, giving customers two months to adjust any of their service selections accordingly, and the first time they will see an invoice item for backups will be on the July 1st invoice.

5. Snapshot Pricing


Accordingly we will also be introducing pricing for snapshots to ensure that we can provide the level of service that customers expect including data redundancy and offsite backups as well. The price for snapshots will be $0.02 per GB of snapshot storage. These rates will also go into effect as of June 1st, also giving customers two months to adjust their snapshot usage as they like without incurring any fees in the interval.

resolved 2013-03-18 22:47:00 UTC
Show Full Details
Mar 17 2013
Mar 16 2013
Mar 15 2013
Mar 14 2013
Mar 13 2013
Mar 12 2013
Mar 11 2013
Mar 10 2013
Mar 09 2013
Mar 08 2013
Mar 07 2013
Mar 06 2013
Mar 05 2013
Mar 04 2013
Mar 03 2013
Mar 02 2013
Mar 01 2013
Feb 28 2013
Feb 27 2013
Feb 26 2013
Feb 25 2013
Feb 24 2013
Feb 23 2013
Feb 22 2013
Feb 21 2013
Feb 20 2013
Feb 19 2013
Feb 18 2013
Feb 17 2013
Feb 16 2013
Feb 15 2013
Feb 14 2013
Feb 13 2013
Feb 12 2013
Feb 11 2013
Feb 10 2013
Feb 09 2013
Feb 08 2013
Feb 07 2013
Feb 06 2013
Feb 05 2013
Feb 04 2013
Feb 03 2013
Feb 02 2013
Feb 01 2013
Jan 31 2013
Jan 30 2013
Jan 29 2013
Jan 28 2013
Jan 27 2013 13:43 UTC

Core Network US1 Region Issue & Resolution

At approximately 6:45AM EST we experienced an issue on our core network in NY1 Region, the core network router that we were in the process of replacing last week experienced a hard failure where it completely lost is BGP session and had a hard failure which prevent failover to the secondary router which was part of our new pair that we were rolling out.

The original router was rebooted and rejoined the network unfortunately reconvergence of network devices did not go smoothly. Half of the devices did not reconverge and required manually intervention to have them reconnect. One top of rack switch had a hard time reconverging and resulted an extended period of network unavailability for the customers that were on those hypervisors.

The core network was back online at approximately 7:15AM, the majority of network devices rejoined the network at 7:30AM, and the final remaining top of rack switch that continued to have an issue rejoined the core network at approximately 7:50AM.

We will be automatically issuing SLA credits to all affected customers and servers.

Network Maintenance

As many customers know we were performing a network maintenance last week that caused several hiccups and network unavailability that last 2-5 minutes a piece. This was a result of the failing core router that was still part of the original network having hardware issues.

The original failover core router was replaced with the first of the pair of new network cores however the old core was showing abnormal behavior which was the cause of the hiccups that we experience during the network maintenance that we were performing.

We were planning to complete the core network maintenance this week to completely pull out the old cores and replace them with the two newer cores but the hardware issue that the old core was experiencing became progressively worse until it caused a complete hardware failure.

Future and Solution

We will continue to monitor how the old core is performing as it is currently still part of the core network and review our planned network maintenances that are scheduled for this week to complete work on removing the old core.

Currently everything is up and running and as a result of the hard failure many network devices are now using the new core as their primary routing point, which actually moves our network maintenance ahead so it will mean that the remaining network maintenance will require less work. As a result of this issue the network maintenance is now further along than it was at the end of this past week.

We will be opening up additional network maintenance windows this week to complete the network maintenance and remove the old core entirely and replace it with the second core network. The new core networking gear will enable growth of our NY1 region for the next 4 years and we plan all roll outs of datacenters with core routers that support a 4+ year life span at which point we normally enter a maintenance phase to review their performance and replace them if necessary.

AMS1 Region

The AMS1 region was unaffected by this issue and the core network in AMS was replaced over late November 2012 with new core networking gear which means we do not have any planned maintenances for that region is it is already running on new core infrastructure.

We rolled out the new core networking gear in AMS1 as it is our secondary location first and were in the process of rebuilding the core network in NY1 unfortunately it seems that our planned maintenance in removing the original core network in NY was causing issues on the gear as the new configurations that we were enabling between the new and old core was part of what caused the situation to worsen.

History

At approximately 6:45AM EST we experienced an issue on our core network in NY1 Region, the core network router that we were in the process of replacing last week experienced a hard failure where it completely lost is BGP session and had a hard failure which prevent failover to the secondary router which was part of our new pair that we were rolling out.

The original router was rebooted and rejoined the network unfortunately reconvergence of network devices did not go smoothly. Half of the devices did not reconverge and required manually intervention to have them reconnect. One top of rack switch had a hard time reconverging and resulted an extended period of network unavailability for the customers that were on those hypervisors.

The core network was back online at approximately 7:15AM, the majority of network devices rejoined the network at 7:30AM, and the final remaining top of rack switch that continued to have an issue rejoined the core network at approximately 7:50AM.

We will be automatically issuing SLA credits to all affected customers and servers.

Network Maintenance

As many customers know we were performing a network maintenance last week that caused several hiccups and network unavailability that last 2-5 minutes a piece. This was a result of the failing core router that was still part of the original network having hardware issues.

The original failover core router was replaced with the first of the pair of new network cores however the old core was showing abnormal behavior which was the cause of the hiccups that we experience during the network maintenance that we were performing.

We were planning to complete the core network maintenance this week to completely pull out the old cores and replace them with the two newer cores but the hardware issue that the old core was experiencing became progressively worse until it caused a complete hardware failure.

Future and Solution

We will continue to monitor how the old core is performing as it is currently still part of the core network and review our planned network maintenances that are scheduled for this week to complete work on removing the old core.

Currently everything is up and running and as a result of the hard failure many network devices are now using the new core as their primary routing point, which actually moves our network maintenance ahead so it will mean that the remaining network maintenance will require less work. As a result of this issue the network maintenance is now further along than it was at the end of this past week.

We will be opening up additional network maintenance windows this week to complete the network maintenance and remove the old core entirely and replace it with the second core network. The new core networking gear will enable growth of our NY1 region for the next 4 years and we plan all roll outs of datacenters with core routers that support a 4+ year life span at which point we normally enter a maintenance phase to review their performance and replace them if necessary.

AMS1 Region

The AMS1 region was unaffected by this issue and the core network in AMS was replaced over late November 2012 with new core networking gear which means we do not have any planned maintenances for that region is it is already running on new core infrastructure.

We rolled out the new core networking gear in AMS1 as it is our secondary location first and were in the process of rebuilding the core network in NY1 unfortunately it seems that our planned maintenance in removing the original core network in NY was causing issues on the gear as the new configurations that we were enabling between the new and old core was part of what caused the situation to worsen.

resolved 2013-01-27 14:35:00 UTC
Show Full Details
Jan 26 2013
Jan 25 2013
Jan 24 2013
Jan 23 2013
Jan 22 2013
Jan 21 2013
Jan 20 2013
Jan 19 2013
Jan 18 2013
Jan 17 2013
Jan 16 2013
Jan 15 2013
Jan 14 2013
Jan 13 2013
Jan 12 2013
Jan 11 2013
Jan 10 2013
Jan 09 2013
Jan 08 2013
Jan 07 2013
Jan 06 2013
Jan 05 2013
Jan 04 2013
Jan 03 2013
Jan 02 2013
Jan 01 2013