Sign up for a Free 30 Day Trial of G Suite Business and get Free Admin support from Google Certified Deployment Specialists. No Credit Card Required.
After a catastrophic failure left the Google Cloud environment in tatters in June 2019, the Google Cloud Platform experienced another multi-hour problem in July, leaving many customers concerned. Crucially, the outage wasn't caused by problems with things like hackers or dangerous code. Instead, there was simple a problem with the fiber optic cables responsible for linking Google's cloud servers in its eastern US region. The cables physically snapped, which meant that connectivity was all but shut down with the outside world.
The good news is that Google, as always, was quick to rectify the issue. According to the team, the disruptions to the load balancing and cloud networking features were caused by physical damage to fiber bundles. This event in itself is something that's largely outside of Google's control. Even the best cloud connectivity environments in the world still can't operate well if something breaks the cables at their core. The issues were reported on the 3rd of July on the Google Status page, and fully fixed again by the 4th of July.
Of course, any lengthy period of downtime is sure to lead to some distress, particularly among the companies that rely on the Google cloud for operation on a day-to-day basis. For about nine hours or so, applications in the Google environment struggled to successfully connect services and systems in the region. Developers and admins were forced to push workloads into different regions and redirect traffic to keep everything ticking over smoothly.
What Actually Happened to the Google Cloud
The issues with the Google Cloud environment caused by fiber cable issues offers a useful insight into what the cloud really is, for people who don't understand where their data goes when they're online. The GCP began to encounter external connectivity loss for various US-East1 zones and the traffic entering those regions on the 3rd of July. The fiber provider responsible for the outage was quickly informed and asked to investigate the issue so that Google could restore service as soon as possible.
According to Google, to restore service as quickly as possible, the team reduced their network usage and prioritized customer workloads. This meant that Google redirected the traffic generally intended for services hosted in the data center region to other locations. This meant that remaining connectivity could be dedicated to customer packs first.
One upside is that the customers on the Google Cloud Load balancing service were automatically set up to have their data fall over into other regions, which would have minimized the impact on their workload. Because the Google Cloud platform itself wasn't at fault, and the problem was instead relegated to the fiber provider, Google claimed that the region technically wasn't "down". However, some people responded to this claim negatively, saying that if they couldn't access Google services, then they regarded them to be "down".
Google Isn't the Only One to Suffer Outages Lately
For those concerned about the security and reliability of Google, it's worth noting that Google hasn't been the only cloud provider to face issues lately. Similarly to the Google Cloud Platform, the popular CDN provider, Cloudflare also saw outages during July, caused by various issues. The first problem was blamed on internet routing errors that could be tracked back to problems from Verizon that caused cascading failures. The second CDN outage, on the other hand, was caused by a bad software deploy that occurred internally and triggered a CPU equipment spike.
With two major outages in two months, Google is working harder than ever to make sure that it provides users with the reliable experiences that they're looking for on the cloud. However, it's worth noting that even the biggest cloud providers are subject to issues at times. In this case, it wasn't the way that Google was configured, or even an issue with privacy and security that led to problems with the GCP. Instead, the issue was caused by fiber networks that needed to be physically repaired before the system could get back up and running.
Ultimately, the July outage is evidence that no matter how hard you work to make a cloud environment redundant, there's always the chance that something could go wrong. That's why it's so important for businesses to work with a cloud provider that solves issues as quickly as Google.