WOOF! Newsletter

August 16, 2024

What's the Risk When the Majority of the Country's Data Runs on Only 3 Cloud Providers?

The CrowdStrike outage took down over a billion Windows computers, wreaking havoc on airlines, hospitals, businesses ... and some of Microsoft's own cloud servers. The crash echoed what a massive cloud outage would look like.

Are there any companies in 2024 who don’t have at least one—or even all—of their day-to-day applications on the cloud? It's no wonder cloud services are so popular. They enable remote work, offer a predictable monthly expense, and allow cost reductions on IT staff.

As the last remaining holdouts move to the cloud, are we creating a situation where the next outage takes down a major swath of the country’s essential computing systems? The recent CrowdStrike outage simulated that type of event. 


The Worst Cloud Failure in History: What Happened with the CrowdStrike Outage

In case you haven’t heard the full story:

On July 19, 2024, computers across the world began updating with a new antivirus software patch issued by CrowdStrike. A big name in security solutions, CrowdStrike’s patch was supposed to update its Falcon antivirus software with routine material.

That’s not what happened.

The update crashed a little over 1 billion computers worldwide, by last count. Critical infrastructure went completely down, from airlines to government agencies to banks.  Why?  The bad update was embedded in a Microsoft kernel. Which means it took down every Windows machine with CrowdStrike installed.

Worse still, CrowdStrike couldn’t reverse the effect. They issued a workaround within 78 minutes, but you had to deploy the fix on each and every computer affected. It took weeks to repair the damage.

The explanation we’ve all heard for the error doesn’t make sense to a lot of software engineers who are used to working under rigid QA testing processes.  With even a single test, the update never would have made it out the door.  The truth is out there somewhere.

 

Resiliency Quote from Robert Thomas

 

Needless to say, this is one of the worst IT outages ever. An estimated $5 billion in losses.

While CrowdStrike is not a cloud hosting service, their flawed update attempt simulated what a major cloud service outage or cyberattack may look like in the future, as more and more organizations consolidate their IT on the same hosts.

 

The Problem with Consolidation on the Cloud

Here’s the problem – “The Cloud” is a near-monopoly!

Three companies basically control what we call the Cloud:

  • Google – Google Cloud, Workspace, Gmail
  • Microsoft – Azure, M365
  • Amazon – Amazon Web Services (AWS), RDS, SES, Amplify, etc.

What happens when we consolidate so much computing power in the hands of three corporations? Risk.

  • Security Risk:  Major amounts of data could be exposed through security breaches and cyberattacks. When hackers figure out a way to breach an account on a cloud provider, all they have to do is repeat the process.

  • Privacy Risk:  If a cloud service provider decided to look at your data, how would you ever know? You wouldn’t. Some providers even say they have the right to share your data with regulatory agencies, the government, etc.

  • Reliability Risk:  Outages like CrowdStrike disrupt all services connected to them. Big cloud providers offer hackers and nation states big targets to infiltrate and disrupt for a myriad of reasons.  Your company’s services could get caught up in an attack directed at another.

These aren’t the only vulnerabilities we have to deal with in 2024 either.

  • Takedowns Outside Our Control: The CrowdStrike outage hurt so much because it affected more than just the CrowdStrike software. It took down millions of computers running the Windows operating system. Even some of Microsoft’s own cloud servers, providing M365 software to global businesses.  One cloud service took another one out. How do you safeguard against that?

  • Unintended Privacy Law Violations: When your data is “in the cloud”, where is it exactly? It may seem odd to consider in the current work environment, but your data’s actual location is important. Here’s why.  For many industries, privacy laws can require you keep your data within the same country. If a cloud service stores data in another country, they may not even tell you…but now you’re in violation of those laws.


Cloud Benefits-Risks Quote from Tom Marsland

 

We are NOT saying this makes the cloud a poor service choice. The cloud has too many benefits not to use it. The key is to manage the risks.

 

Our Favorite Strategies for Lowering Your Cloud Risks and Ensuring Uptime

There are two major schools of thought concerning network Infrastructure. 

  1. Distributed – which used to mean several sites that worked in tandem.  These days the cloud is acting as a distributed infrastructure. 

  2. Centralized – in this scenario everything is in one spot, under your control.  This is on-premises hosting.

A hybrid approach works best.  It focuses on limiting the use of the big providers (AWS, Microsoft, Google) and brings your critical data under your control.

Here’s what that looks like: 

  • Use the Cloud for Applications That Require It
    • Some applications only exist on the cloud, so obviously keep those on the cloud (Salesforce, CRM, ERP, project management software, etc.)
    • Depending on the sensitivity of your data, email and files servers may be appropriate for the cloud.

  • Move Your Business-Critical Data to On-Premises Hosting
    • Consider moving your business-critical systems off the public cloud, onto on-premises servers (or a private cloud with a local host you trust).  These could include file servers that contain your company’s IP, your accounting system, and Active Directory.
      • The up-front costs for on-premises servers have gone down quite a bit in recent years!
      • Smaller local hosting companies can offer better personal service, and you know exactly where your data is.

  • Wherever You Host, Always Have Managed Cloud Backups
    • Keep them in two geographically separate locations, ideally.
    • Schedule monthly test restores to ensure that you know your data is recoverable in a disaster situation.

  • Install a Backup Circuit
    • If your internet service goes down, be prepared with a failover circuit from another provider so your team can continue to work.

  • For 24/7 Uptime, Set Up a High-Availability Network Infrastructure
    • This way, if your servers go down in one location, your systems failover to an alternate site.

If you have access to an IT company with senior-level Network Engineers, ask them to revisit your network strategy with a focus on reliability and high availability. They’ll recommend solutions that should cover the above ideas.


Pro Tip: Avoid solutions that the government uses!  When evaluating IT solutions, it’s typically not a plus if the government uses it.  Think of all the IT-related government failures you’ve read about over the years. Steer clear of any IT service company that touts having government customers as a sales advantage. 


If it’s time to re-evaluate your cloud usage, contact PlanetMagpie’s IT Experts at sales@planetmagpie.com


Robert Douglas, IT Consulting Team Lead

consulting@planetmagpie.com