Amazon gets 'black eye' from cloud outage

Analysts say downtime hurts Amazon, and cloud computing

For a company that's known as the dominant player in the cloud market, Amazon's troubles on Thursday means a black eye for the company and for the cloud in general.

Trouble started early Thursday morning when popular websites like Quora, foursquare and Reddit were left staggering or totally knocked out because of server problems in the Amazon datacenter that handles the company's Web hosting services.

While service was restored by 4 p.m. ET to some sites such as Foursquare, Quora was still disabled and Reddit was still being affected.

"Reddit is in 'emergency read-only mode' right now because Amazon is experiencing a degradation," the company noted on its site. "They are working on it, but we are still waiting for them to get to our volumes. You won't be able to log in. We're sorry and will fix the site as soon as we can."

According to AlertSite, a Web performance management company, between 6 a.m. and 1 p.m. ET Thursday, one portion of the Reddit site took more than 60 seconds to load, only to return an error message. Foursquare's homepage had 84.44% availability between 8:15 a.m. and noon, also returning error messages explaining the downtime and slowness.

At 5:16 a.m. ET Thursday, site administrators reported that they were dealing with connectivity issues impacting Amazon's Relational Database Service, which is used to manage a cloud database, across multiple zones in the Eastern United States.

That means some Web sites were down or partially disabled for at least 11 hours.

While that's a problem for the downed sites, it's probably going to be tougher on Amazon itself, according to Robert Mahowald, an analyst with IDC. "Amazon is held as a paradigm of operational uptime," he said. "When this kind of thing happens, it definitely sends a chill through the whole cloud and hosted services industry.... It's absolutely a black eye. There's no doubt about it."

Mahowald was quick to point out that this kind of outage happens. It doesn't point to a specific operational problem at Amazon.

"This shouldn't give Amazon a bad reputation, but this is a very, very visible problem," he said. "I don't think it will turn people's heads away from using Amazon, but it will give companies that have been on the fence a lot of cause for pause. This will live on and on on the Web."

The biggest impact from the outage may be to the cloud itself, said Rob Enderle, an analyst with the Enderle Group.

"What will take a hit is the image of this technology as being one you can depend on, and that image was critically damaged today," he added. "If the outage continues for long, it could set back growth of this service years and permanently kill efforts by many to use this service in the future."

Given how high profile the outage has been, it may be hard to dismiss - especially for people who have to decide whether they're moving their enterprise to the cloud.

"This provides a massive showcase of the risk associated with these kinds of services, which are sold like utilities but don't yet have the reliability we expect of most utilities," said Enderle. "The impression being set today, if the outage continues, may take five to 10 years to fully recover from."

Keith Shaw chats with Network World's Jon Brodkin about the Amazon EC2 cloud service outage that brought down some high-profile Web sites on Thursday.

Sharon Gaudin covers the Internet and Web 2.0, emerging technologies, and desktop and laptop chips for Computerworld. Follow Sharon on Twitter at @sgaudin or subscribe to Sharon's RSS feed. Her e-mail address is sgaudin@computerworld.com.

Tags cloud computinginternetamazon.com

Show Comments