Datacentre power cut knocks ‘hundreds’ of internet services offline

A power outage affecting one of Amazon Web Services’ (AWS) largest US datacentre regions reportedly knocked hundreds of online services offline across the world on Friday 2 March.  

The cloud services giant confirmed that its US-East-1 region suffered two separate power loss incidents over the course of two hours in one of the site’s network peering facilities, each one lasting about 10 minutes.  

As a result, organisations that rely on that region to host their applications and workloads “may have experienced internet connectivity issues”, said AWS in a statement on its services status page.

“Our network is designed to be fully redundant with multiple independent peering facilities in every region,” the statement continued. “Some customers experienced elevated latency and packet loss while the network rerouted affected traffic to these unaffected network peering facilities.

“Some packet loss was also observed as we restored traffic to the affected network peering facility.”

Computer Weekly contacted AWS for further details about Friday’s outage, but had not received a response at the time of publication.

According to an analysis of the incident by networking monitoring company ThousandEyes, more than 240 “critical services” that run on the AWS infrastructure suffered a disruption because of the outage, including Slack, Twilio and Atlassian.

According to reports, the incident also blighted US-based users of Amazon’s voice assistant technology Alexa, as well as organisations that rely on the firm’s Direct Connect service to obtain a private connection between their datacentres and the AWS cloud.

“The AWS-East region is one of the first AWS [datacentre] regions and is, hands down, their largest, with at least five availability zones,” wrote Archana Kesavan, senior product marketing manager at ThousandEyes, in a blog post. “What started as a power outage impacting a small set of services quickly cascaded into a major event.”

News of the outage comes nearly a year to the day after Amazon’s Simple Storage Service (S3) suffered an outage that led to widescale disruption across the internet, after an engineer incorrectly executed a command at the same AWS datacentre region that led to an unspecified number of servers falling offline.

This latest incident serves to highlight just how complex and interconnected the services that run in the public cloud are, said Kesavan.

“Outages and natural disasters in one part of the cloud can quickly ripple over into other areas,” she added. “Cloud vendors offer several ways to directly connect into their infrastructure. However, they do not make you immune from the external dependencies of the internet.

“While availability zones offer some level of redundancy, regional outages like these can quickly envelop entire clusters of datacentres.” ……………………………………………………………………………………… ………………………………………………………………………………………………………….

Be the first to comment

Leave a Reply

Your email address will not be published.


*