Think your network is too big to fall? It’s hard to imagine that any company or organization, no matter how powerful, will feel immune following the spectacular Crowdstrike outage that recently struck across the globe. Coined the “Blue Screen of Death”, in reference to the error message that appeared on millions of PCs, the outage paralyzed industries ranging from airlines, to health, finance, automotive and government agencies.
Though BSOD is the first outage of such a large-scale magnitude, it’s unlikely to be the last. To understand why, you’d need to consider two inherently related but seemingly contradictory realities of modern IT:
The first, according to Bloomberg News, is that “the IT systems of some of the world’s biggest and most critical industries have grown heavily dependent on a handful of relatively obscure software vendors, which are now emerging as single points of failure.”
The second, and perhaps more perplexing issue, is that we don’t know how to prevent these events because of the built-in complexity of our technological systems. As one Op-Ed writer quips in The Atlantic, “our technological systems are too complicated for anyone to fully understand. These are not computer programs built by a single individual; they are the work of many hands over the span of many years. They are the interaction of countless components that might have been designed in a specific way for reasons that no one remembers.”
Anyway you look at it, Bloomberg warns that these technological underpinnings pose an “increasingly dire threat to global supply chains.”
And while businesses may not be able to prevent these kinds of costly outages, they can definitely mitigate them with a solid Business Continuity and Disaster Recovery plan. Here’s what IT decision-makers and business owners need to know.
What is Business Continuity and Disaster Recovery (BCDR)?
Think of BCDR as your business’s game plan for when things go sideways. It’s all about making sure your company can keep running and bounce back quickly when disaster strikes – whether that’s a massive power outage, a cyber attack, or a natural disaster. Basically, it’s your business’s insurance policy against Murphy’s Law.
The key components of BCDR include continuity planning, disaster recovery planning, and crisis management. Continuity planning focuses on identifying critical business functions and developing strategies to maintain them during a disruption. Disaster recovery planning involves creating detailed procedures for restoring IT systems, data, and infrastructure after a disaster. Crisis management addresses the immediate response to an incident, including communication protocols and decision-making processes. Together, these components form a holistic framework that helps organizations minimize downtime, protect assets (including brand reputation), and maintain stakeholder confidence in the face of unexpected challenges.
What’s the difference between business continuity and disaster recovery?
In short, business continuity is about keeping the whole show running, while disaster recovery zeroes in on getting your tech back on track after a crisis. Business continuity involves planning for various scenarios, not just disasters, and covers strategies for maintaining customer service, supply chains, and other critical functions. Disaster recovery includes detailed plans for data backup, system restoration, and failover procedures. They’re two sides of the same coin, working together to protect your business .
Another related, but separate concept is business resilience, which zooms out even further. In essence, business resilience is about building an organization that can weather any storm, adapt to change, and even thrive in adversity. Resilience is the overarching philosophy, while continuity and recovery are some of the tools used to achieve it.
Here's why BCDR is crucial for businesses
Network downtime and data loss can be devastating, leading to lost revenue, damaged customer relationships, and decreased productivity. In some industries, even brief outages can result in millions of dollars in losses. Preliminary estimates of the recent Crowdstrike outage stand at one billion dollars.
BCDR can actually save money in the long run. While implementing these systems requires an upfront investment, the cost of not having them can be far greater. In fact, Gartner estimates the average cost of downtime to be around $5,600 per minute for large enterprises.
Statistics paint a grim picture for businesses without solid BCDR plans. According to the Federal Emergency Management Agency, 40% of businesses never reopen after a disaster, and of those that do, only 29% were still operating after two years. The 2019 Capital One data breach, affecting over 100 million customers, serves as a stark reminder of what can happen without proper safeguards.
Regulatory compliance is another critical factor. Many industries have strict requirements for data protection and business continuity. Failing to meet these can result in hefty fines and legal troubles. For instance, HIPAA in healthcare and GDPR in Europe have specific mandates for data protection and recovery.
Without a resilient network to run on, BCDR is just a piece of paper. So Teridion is built to outsmart any connectivity challenge. In fact, it’s the only AI-WAN with a global backbone that spans 25 public cloud providers in over 500 global PoPs.
A lot can go wrong and not everything can be anticipated. There are, however, typical scenarios to consider when expecting the unexpected in business.
1. Natural disasters:
These include events like earthquakes, hurricanes, floods, or wildfires. They can cause physical damage to facilities, disrupt power and communications, and prevent employees from reaching work.
2. Power outages:
Can be caused by grid failures, severe weather, or equipment malfunctions. Even short outages can halt operations and damage sensitive equipment.
3. IT outages and cyber attacks:
Range from system failures to deliberate attacks like ransomware or DDoS. They can cripple operations, compromise data, and damage reputation.
4. Public health crises:
As we’ve seen with COVID-19, these can force businesses to rapidly adapt to remote work, deal with staff shortages, and navigate changing regulations.
5. Physical security threats:
Include scenarios like workplace violence, terrorism, or civil unrest. They pose risks to employee safety and can disrupt normal business operations.
6. Supply chain disruptions:
Can be caused by various factors like transportation issues, geopolitical events, or supplier bankruptcies. They can lead to production delays and inventory shortages.
When Should BCDR Be Activated?
BCDR should be activated in various situations that pose a significant threat to normal business operations. These include immediate threats or escalating situations related to the six scenarios outlined in the previous section.
Other instances when BCDR should be activated include regulatory triggers, such as discovering data breaches or compliance violations that could lead to operational shutdowns, or internal issues such as loss of key personnel, major product recalls, or significant financial troubles.
The key is to have clear, predefined activation criteria for each type of scenario. This helps ensure a quick response without unnecessary false alarms. It’s also important to have different levels of activation, as not all situations require full BCDR implementation.
Remember, the goal is to activate BCDR plans early enough to be proactive, rather than waiting until the situation becomes dire.
Key Elements of an Effective Business Continuity and Disaster Recovery Plan
An effective Business Continuity and Disaster Recovery Plan starts with a thorough risk assessment and business impact analysis, identifying potential threats and their consequences. This informs the development of specific recovery strategies for each major risk. Key roles and responsibilities must be clearly established, including a designated BCDR team leader and specific roles for various aspects of recovery.
Crucial to this process is creating robust communication plans for all stakeholders, ensuring everyone knows how to access and share critical information during a crisis. The plan shouldn’t just exist on paper – it needs regular testing through drills and simulations to identify weaknesses and familiarize staff with their roles. Finally, the BCDR plan must be a living document, regularly reviewed and updated to reflect changes in the business environment, incorporate new technologies, and apply lessons learned from real incidents or near-misses. This ongoing process ensures the plan remains relevant and effective in protecting the organization.
Two key concepts in BCDR are Recovery Time Objective (RTO) and Recovery Point Objective (RPO):
RTO: This is the maximum acceptable time a business can be down after a disaster. It answers the question, “How quickly do we need to be back up and running?”
RPO: This represents the maximum amount of data loss a business can tolerate. It answers, “How much data can we afford to lose?”
For example, a bank might have an RTO of 1 hour (can’t be down for long) and an RPO of 0 minutes (can’t lose any transaction data). A small retail shop might have an RTO of 24 hours and an RPO of 24 hours.
These metrics help shape the BCDR strategy, determining things like how often to back up data and what kind of redundant systems are needed. They balance the cost of implementing BCDR solutions against the potential cost of downtime and data loss.
Examples of BCDR
Crisis Management Plan:
This is the overarching strategy for handling any major disruption. It outlines the decision-making process, defines leadership roles, and sets priorities for the organization’s response to various crisis scenarios. The plan typically includes steps for assessing the situation, mobilizing resources, and coordinating different teams.
Communications Plan:
This focuses on how information will be disseminated during a crisis. It includes protocols for notifying employees, customers, suppliers, and other stakeholders. The plan outlines which communication channels to use (e.g., email, SMS, social media) and who’s responsible for crafting and delivering messages.
Data Center Recovery Plan:
This plan details how to restore IT operations if a data center is compromised. It includes procedures for activating backup systems, recovering data from offsite storage, and potentially shifting operations to a secondary data center. The plan also prioritizes which systems to recover first based on business criticality.
Network Recovery Plan:
This addresses how to restore network connectivity in case of outages or cyber attacks. It includes steps for rerouting traffic, activating redundant systems, and securing the network against ongoing threats. The plan also covers how to communicate and operate if normal network channels are down.
Virtualized Recovery Plan:
With the increasing use of cloud and virtualized environments, this plan focuses on recovering virtual machines and cloud-based services. It includes procedures for spinning up backup instances, restoring from snapshots, and ensuring data consistency across virtualized environments.
Each of these plans works together as part of a comprehensive BCDR strategy, ensuring the organization can respond effectively to a wide range of potential disruptions.
Network Infrastructure in BCDR
A reliable network is the backbone of modern business operations, supporting everything from communication and data access to customer service and financial transactions. It’s as essential as electricity.
Naturally, the performance and reliability of a network directly impact business continuity. In a disaster recovery scenario, a robust network is essential for accessing backup systems, coordinating response efforts, and maintaining communication with stakeholders. Network failures can have far-reaching consequences, potentially bringing business operations to a standstill.
Examples of network failures impacting businesses are numerous and often costly. In 2017, a configuration error at Level 3 Communications (now part of CenturyLink) caused widespread internet outages affecting major services like Netflix and PlayStation Network. In 2019, a network outage at Target stores nationwide prevented customers from making purchases for several hours, resulting in significant revenue loss. These incidents underscore the importance of including network resilience in BCDR planning, with strategies like redundant connections, failover systems, and regular network testing being essential components of a comprehensive BCDR approach.
How Teridion Enhances Business Continuity and Disaster Recovery
Teridion’s AI-powered Network as a Service simplifies end-to-end connectivity in today’s complex network environment. Major players like Deutsche Telekom, Open Systems and Barracuda rely on Teridion to overcome various connectivity challenges, from minor latency issues to complete outages.
What sets Teridion apart is its unique global backbone. This network spans 25 public cloud providers across more than 500 global Points of Presence (PoPs). Extensive infrastructure, coupled with dynamic routing and intelligent traffic management, allows Teridion to consistently identify and use the most efficient path between any two points on the internet.
By leveraging this vast network and AI-driven routing, Teridion offers a robust solution for businesses looking to enhance their network resilience and performance. This approach not only improves day-to-day operations but also significantly bolsters Business Continuity and Disaster Recovery capabilities, ensuring that organizations can maintain critical connections even in challenging circumstances.
Future Trends in BCDR
The future of Business Continuity and Disaster Recovery (BCDR) looks pretty exciting as it is being shaped by emerging technologies and evolving business needs. AI and machine learning are at the forefront, enabling predictive analysis of potential disruptions and automated response processes. Cloud-based BCDR solutions are gaining popularity for their flexibility and scalability, while IoT and edge computing are creating more resilient, distributed systems. Blockchain is also being explored for enhancing security and transparency in BCDR processes.
Looking ahead, we can expect BCDR planning to focus more heavily on cyber resilience, with strategies becoming more dynamic and deeply integrated into everyday business operations. Advanced simulation technologies will likely improve disaster preparedness. On the network infrastructure front, we’ll likely see the rise of self-healing networks powered by AI, increased adoption of software-defined networking, and the integration of 5G technology.
All this points to a future where BCDR isn’t just a boring backup plan gathering dust in a drawer, but a dynamic, always-on guardian of business operations.