Business Continuity Planning: Preparing to draft the recovery strategy
In Part 10 of our Business Continuity Planning Series, we outlined the risk assessment stage. Understanding risks within the context of business continuity is an important prerequisite to developing a recovery strategy (the next and second last phase of your organization’s overall continuity plan). In this article, we’ll discuss key steps and considerations to keep in mind when designing a recovery strategy. An effective recovery strategy will make your organization more resilient when faced with disasters.
Overview
A recovery strategy (or business recovery strategy) is a conceptual summary of the recovery processes that your organization needs to execute during a disaster, in order to restore normal operations. If your organization follows the continuity plan steps we’ve outlined in our series, the relevant recovery processes will have already been identified during the BIA.
Disasters are unpredictable and hence your agenda for the recovery strategy should be to provide sufficient information to the business continuity teams so they restore the business processes in an orderly manner. You will have many things to manage in the aftermath of a disaster. A well thought-out strategy will help you reduce the complexity as it will provide visibility into the things that are most important to the business. Having these insights ahead of time will better empower your teams to focus on managing the aftermath of a disaster, regaining control and restoring the key processes quickly.
Scenarios
To develop your recovery strategy, first you must identify the disasters that are common to your city or country. Records of past disasters will help you appropriately select and prepare for the breadth of challenges you could face. It would not make sense to consider the ramifications of a tsunami when you are landlocked. On the other hand, if your organization was based near a large body of water, suddenly that is a risk you should factor in — even if there isn’t precedent in your region.
Below are some of the disasters that could demand activation of a business continuity plan:
An office space or building has caught on fire and cannot be put out using a fire extinguisher. Hence, the space cannot be used for a short or prolonged period of time.
A flood, due to excessive water leakage or a natural disaster, has damaged critical business infrastructure.
An act of terrorism is declared by your local government, forcing you to reduce business operations until the government announces that it is safe to relaunch regular operations.
A city-wide power outage occurred and a longer restoration time is expected.
Your country has declared war and hence, normal operations are not feasible.
A severe earthquake has occurred and it has affected critical infrastructure or office buildings.
Multiple hacking attempts have been detected, but the extent of the damage is unknown or the intruder is still suspected to be inside your technology infrastructure.
Some or all software and communications systems are shutdown due to a suspected security breach
An epidemic is announced, forcing the businesses and the general public to go into isolation.
A pandemic is announced, forcing the businesses, most of the government and the general public to go into lockdown or isolation.
Some of your critical business systems have crashed, including their failover backups, and so the team is not sure about the recovery timeline due to the complexity of restoration.
While this list is not exhaustive by any means, it does provide some insight into the scenarios that can create instability within your business for a period of time that's long enough to warrant implementing a continuity plan. However, developing a recovery strategy to mitigate each of these scenarios is not not realistic, regardless of the likelihood. Hence, your organization will need to categorize all possible scenarios into two or three major groups.
Categories for planning the recovery strategy
If you list all possible disaster scenarios and analyze them, you will start noticing patterns that will enable you to fit all of them into two or three categories. The two most common categories are:
Denial of access to the office buildings
Denial of access to the information systems
Most of the time, the above two will suit your needs, unless you have a special requirement that is not a disaster in the way it's been defined in the previous section. Some specialized industries like manufacturing or oil, when faced with the unavailability of critical spare parts, may be unable to operate their production plants altogether.
Other types of businesses may run into a situation where a lack of resources like cash flow, could affect the business for a period of time until it finds the required resource. In such situations, the business cannot be shut down, since it has enough customers to serve, but it just has insufficient cash to buy new materials and subsequently deliver the goods.
Hence, the third most common category is:
Unable to produce or serve customers due to the unavailability of resources
Classification of the disasters
Once you have listed the potential disasters and categorized them into their categories, it is time to develop a classification system. This system will help to know when and what part of the business continuity plan should be executed in a disaster. Generally, executing a business continuity plan is expensive and so, except when it is unavoidable, only the relevant portion of the plan must be executed. Below is a three tiered classification for you to consider implementing.
Low
An incident has occurred during a non-business-critical period, non-critical area of your building or on a non-business-critical information system. The teams involved in managing the crisis have completed their assessment and declared it as minor enough not to activate the BCP to its entirety, Instead, not more than a part of the BCP needs to be activated (such as the crisis management segment).
Medium
An incident that occurred and has the capability to disrupt the business for a day or two can be classified as Medium. Examples of incidents that you might classify as medium are:
Part of your office building is not accessible due to a fire incident. Some reportable injuries identified, but no loss of human or animal life is expected.
A security breach is reported and some of the non business critical systems are affected. The cybersecurity response team is confident that the breach can be contained without impacting the business critical information such as customer data, financial data or information from the industrial control systems. However, the restoration of these systems might take more than two days and many security sweeps must be performed before returning to normalcy.
Incidents under this classification require activation of your entire business continuity plan, including the recovery site if applicable. However, the situation is under control or not expected to go out of control. The confidence level after the investigations are high and hence, there is no need to declare a disaster.
High
One or more incidents have disrupted the business within a few hours of each other and the expected duration of the disruption will be unknown for more than a day. Board members, investors, customers and authorities outside the organization must be kept informed or involved.
Examples of incidents that can be classified as high are:
Heavy storms have caused havoc during business hours, damaging the majority of your building. Heavy casualties are expected. Local authorities are involved in the rescue operations and the local (or international media, depending on your organization) media must be kept informed.
The federal government has declared a pandemic and the timeline to return to normalcy is not yet known. However, civilians are asked to be in self isolation and limit travel to accessing only the most essential services.
A major cyberattack is identified, but the severity has not yet been determined. Third party incident response teams are involved in containing the attack and restoring the information systems.
Infrastructure in the primary data center is rendered unusable due to a major water leak. It is decided that cloud infrastructure should be procured and critical business systems must be brought online to avoid catastrophe, and it is going to take more than a few business days.
As you can see, these incidents can result in a disaster and cripple an organization very quickly, if the business does not have alternate plans or does not have the means to restore critical business services.
Conclusion
A well executed recovery strategy could mean the difference between your organization bouncing back after a disaster or never recovering. While there are countless potential disasters you can face, if you can categorize them into two or three simple groups and assess the severity of the risks, you will find it much easier to respond quickly and effectively when something does happen.
Subscribe here to get notified when Part 12 is published and to receive updates on our upcoming Akrogoniaios Technologies toolkit..