A blog of the Science and Technology Innovation Program
Responding to the CrowdStrike Outage: What’s Next?
The recent CrowdStrike outage highlights the risks of our interconnected society and brings forward renewed concerns around how resiliency is built into software supply chains.
On Friday, July 19, the ‘Blue Screen of Death’ inundated news feeds and social media channels as a result of a historic, global IT outage resulting from faulty code. That morning, cybersecurity company CrowdStrike pushed a defective sensor configuration update to certain Windows hosts using its Falcon platform that triggered a system crash on 8.5 million Windows devices, as reported by Microsoft. What followed was global pandemonium–many banks, airlines, hospitals, and government offices were disrupted. While CrowdStrike continues to work to restore all impacted systems, this incident acts as a reminder of the risks that come with our interconnected and digitized society and has recentered debates about what a healthy and secure software supply chain should look like.
A Glimpse at a Healthy Software Supply Chain
For the past two decades, cybersecurity experts have debated the inherent dangers of a software monoculture. Namely, as in the 2003 Computer and Communications Industry’s white paper CyberInsecurity: The Cost of a Monopoloy, experts have argued that a software supply chain that is too dependent on specific software leads to greater insecurity. A software supply chain is essentially everything that touches an application in the software development life cycle: code, components, people, and tools. A chain that lacks diversity creates a fragile ecosystem with single points of failure–one where a strain of malware or, in this case, a single bug can cause harm far and wide.
That is not to say that software monocultures do not have certain strengths or that diversifying software does not come with its own set of challenges, as Rob Enderle illustrated in Information Week. Diversifying our software supply chain comes with higher costs, its own set of security issues, and interoperability obstacles. Can you imagine working on a cross-federal agency project when each agency uses different software? There is already a shortage of IT talent, particularly in the government, making securing diverse software, upkeeping expertise on the different platforms, and ensuring interoperability between all the different parts incredibly difficult.
Nevertheless, a monoculture can be great… until it’s not. The CrowdStrike outage reiterates that Microsoft’s ubiquity is still a danger. While the outage was triggered by a bug pushed by CrowdStrike, Windows devices were the vehicle to global impact. Dependency on Windows devices magnifies the impacts of a bug or security compromise. However, the issue with monocultures extends past just Microsoft and the other biggest players. Software monocultures more broadly, like antivirus software monocultures, can be just as dangerous. TechCrunch reported that CrowdStrike’s own dominance as a cybersecurity solution provider contributed to the lack of resiliency in this outage as well as the number of impacted devices. In other words, “the magnitude of this bad update is so large because the number of devices running both CrowdStrike and Windows is so high,” as stated Eric Grenier, director analyst at Gartner, in a quote in Cybersecurity Dive.
Resiliency, Resiliency, Resiliency
Regardless of monocultures or not, the importance of building resiliency into the software supply chain is invaluable in mitigating what this outage affirms is and will forever be the weakest link when it comes to security: it is the human actors. This bug was a result of faulty code, then inadequate testing, and then released in full force by a lack of a phased rollout.
This will happen again if proactive action is not taken. To build an effective, secure software supply chain, more failsafes to prevent faulty code from being distributed are needed. More diversified IT infrastructures are needed to try to limit the reach of any potential harms and to ensure redundancy during disruptions. Failsafes, diversity, and redundancy are key to resiliency. No software update should have the capability of wreaking such havoc.