AWS comes clean about recent Sydney outage

Public Cloud provider said both primary and backup power failed

Amazon Web Services has highlighted the issues behind its power-related outage in its Sydney availability zone on Sunday night.

At 4pm Sydney time, the company reported the power issue at its Sydney region datacentres delivering its EC2 and S3 services.

The blackout lead to disruption for Sydney citizens and AWS clients as the outage took out major websites such as Foxtel Play, Channel Nine, Domain and Domino’s Pizza.

Partners including Comunet, Bulletproof, RXP Services and Strut Digital were also affected as many worked through the night with clients to work through business-critical challenges.

In a recent blog post, AWS explained how every instance is served by the main utility power and a backup generator diesel rotary uninterruptible power supply (DRUPS), as two independent power delivery sources.

AWS said if either source provides power, the instance will maintain availability as the DRUPS as the secondary source, stores power and starts up if the main utility power is compromised.

However, during the severe weather, the instances that lost power lost access to both primary and secondary powers and consequently, the backup generator could not start up.

AWS described the power failure as an ‘unusually long voltage sag’, as opposed to ‘a complete outage’ and said that the unexpected nature of the voltage sag caused the set of breakers responsible for isolating the DRUPS from utility power, fail to open fast enough.

“Normally, these breakers would assure that the DRUPS reserve power is used to support the datacenter load during the transition to generator power. Instead, the DRUPS system’s energy reserve quickly drained into the degraded power grid,” the company explained.

“The rapid, unexpected loss of power from DRUPS resulted in DRUPS shutting down, meaning the generators which had started up could not be engaged and connected to the datacenter racks. DRUPS shutting down this rapidly and in this fashion is unusual and required some inspection.”

In remediation, AWS said it will add additional beakers to assure a quicker break to connections to degraded utility power to allow the generators to activate before the UPS systems are depleted.

The company added that it will also make fixing the ‘latent bug’ that disabled the automatic recovery systems in customer instances, a priority.

AWS said more than 80 per cent of the impacted customer instances and volumes were online and operational by 1 am PDT after power was restored at 11:46 am PDT.

According to Comunet chief executive, Mark Ogden, 100 of his clients in total were affected and issues across all clients, bar one, were resolved in three hours.

However, this was not the case for all. Strut Digital chief executive, Zack Levy, told ARN that his engineers were still restoring services at 3:30 am.

“We apologise for any inconvenience this event caused. We know how critical our services are to our customers’ businesses. We are never satisfied with operational performance that is anything less than perfect, and we will do everything we can to learn from this event and use it to drive improvement across our services,” AWS said.

AWS channel partners recently told ARN the interruption has proven that business should consider reviewing their architecture model and strategy before considering jumping on the Cloud bandwagon.


Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags Westpacdomaincommonwealth bankATMAWSZack Levyrxp servicesStrut Digital

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.
Holly Morgan
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Cate Bacon

Aruba Instant On AP11D

The strength of the Aruba Instant On AP11D is that the design and feature set support the modern, flexible, and mobile way of working.

Dr Prabigya Shiwakoti

Aruba Instant On AP11D

Aruba backs the AP11D up with a two-year warranty and 24/7 phone support.

Tom Pope

Dynabook Portégé X30L-G

Ultimately this laptop has achieved everything I would hope for in a laptop for work, while fitting that into a form factor and weight that is remarkable.

Tom Sellers

MSI P65

This smart laptop was enjoyable to use and great to work on – creating content was super simple.

Lolita Wang

MSI GT76

It really doesn’t get more “gaming laptop” than this.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?