Backup and Disaster Recovery on Microsoft Azure

by | May 18, 2021 | Blog

The need for disaster recovery planning (DRP) has never been clearer than in recent years. Besides shutdowns due to the COVID-19 pandemic, Queensland organisations have faced everything from flooding and hail storms to devastating bushfires.

Though most businesses have disaster recovery plans in place, even the most detailed plans may not have accounted for the magnitude of these disasters or how long they could keep staff out of the office.

But the middle of a disaster is hardly the best time to revisit the formal DRP planning process. Instead, organisations must prioritise identifying and mitigating the top-tier issues and other unanticipated blind spots that could negatively impact their operations.

In support of this mission, the team at Codify have identified ten specific steps that businesses must take in a disaster, drawing on the priorities we’ve seen have the greatest impact at Noosa Shire Council, QLeave, and other Queensland organisations.

And if you’re not currently in the midst of a disaster? Consider a hypothetical scenario. Think through whether or not your existing DRP would have accounted for each step, as well as how Microsoft Azure could help you navigate seamless service delivery in an emergency.

 

Step #1: Assess the Extent and Impact of the Disaster

As soon as possible after a disaster has occurred, conduct an ad-hoc analysis of the affected systems, equipment, facilities and/or personnel.

  • Evaluate your department’s ability to deliver “business as usual” service
  • Estimate the time and resources needed to recover key services, if required
  • Create a list of key stakeholders and provide them with a status update and estimated timeline for service delivery
  • Identify a point person who will be responsible for communicating with company leadership, team members, media, legal counsel, etc.

The middle of a disaster isn’t the time to fully transition your infrastructure to the cloud. However, it’s worth noting that a platform like Microsoft Azure can be set up quickly and enables DRP on a consumption basis (paying only for what you use removes the high entry costs of traditional DR environments). A cloud-native DR solution can also complement existing SaaS systems, making long-term remote work more viable, if necessary.

 

Step #2: Identify Any Disaster-Related Provisions in Your SLAs

Your service-level agreements (SLAs) with vendors, such as outsourced IT firms or datacentres/colocation facilities, should define service expectations in the event of an emergency.

  • If you aren’t aware of the coverage you’re entitled to, review your SLAs at the onset of an emergency
  • Identify alternate solutions for handling gaps, if necessary
  • If your SLAs don’t include disaster recovery terms, negotiate to have them added as soon as you can

At the outset of the COVID-19 outbreak, for instance, Microsoft saw significant demand across all their cloud services, leading them to enforce a capacity request system that ensured quota made it to those who needed it the most. Behind health and critical infrastructure organisations, government was prioritised to ensure continuity of public services.

If your business isn’t one of these sectors, having alternate solutions is even more important.

 

Step #3: Optimise the Security of Key Stakeholders

Modern device usage practices mean that, today, each team member’s identity is the perimeter you may be called upon to protect.

  • Consider whether a mobile device management or mobile application platform may be beneficial
  • Ensure multi-factor authentication is turned on wherever possible
  • Limit access to business data and software on a per-user basis so that users are only able to access what’s needed to fulfill their role
  • Develop a blacklist of devices and/or applications that cannot be appropriately secured
  • Educate those using personal devices on security best practices, including password standards, malware/phishing attack prevention, etc
  • Develop a data backup strategy that ensures data stored locally on user devices can be quickly restored

Azure Active Directory (AD), Azure Windows Virtual Desktop (WVD), and Microsoft Intune can all support this step. Azure AD provides cloud-based identities with native multi-factor authentication and conditional access policies, creating a seamless experience for users across multiple trusted devices and/or locations. With Azure WVD, businesses can safely extend their corporate network to staff at home, while Intune can be used to support mobile device management (MDM) and mobile application management (MAM) needs for those working remote.

As Noosa Shire Council, which leverages Azure WVD, discovered, “The virtual desktop allows the Council to use staff home PCs while protecting Council data against ransomware and viruses. Staff don’t need to take data offsite via USB or onto personal PCs. Their familiar work desktop including shortcuts is available.”

 

Step #4: Source Any Resources Necessary to Sustain Long-Term Remote Work Arrangements

If it’s likely that disaster conditions will persist for more than a few days, ensure all staff members have the resources needed to sustain long-term remote work.

  • Make a list of all apps, software programs, etc., used by the organisation to conduct business
  • Develop a register to track equipment that is issued or loaned out during the disaster
  • Inventory all existing equipment (if this hasn’t been done before), and ensure any that’s already provisioned to staff has been tracked
  • Survey key stakeholders to identify the requirements of those whose current arrangements are unsuitable for long-term remote work
  • Consider whether migrating to the cloud through a service like Codify could be beneficial from the perspectives of cost, security, and long-term sustainability

Under the pressures of a disaster, agility should be favoured over ‘perfection’. Leveraging the Azure platform, changes can be rolled out in an incremental and automated manner to achieve continuous improvement.

 

Step #5: Ensure Key Stakeholders Can Communicate Securely Whilst Away from the Office

Local councils and government bodies may not be able to legally meet via video conference or phone, but most other organisations can. Ensure these and other operations are conducted securely.

  • Potential communication resources required may include virtual desktops, VPNs, electronic signature tools, video conferencing, secure document sharing, etc.
  • Identify potential security risks with the communication solutions you currently have in place
  • Research alternatives with stronger security features in place
  • If appropriate, purchase and implement more secure communication solution alternatives, including providing access to sensitive documents on-premises

During the COVID-19 pandemic, one Codify client found that video communication quickly became a normal part of their work. Conducting video calls securely through Microsoft Teams using the same identity as Azure WVD ensured easy accessibility by their staff, as well as the appropriate protection of sensitive information.

 

Step #6: Conduct Any Trainings Necessary to Upskill Key Stakeholders on Remote Work Technology

Providing team members who are struggling to adapt to remote work technology with appropriate training can help maintain productivity in a disaster.

  • Connect with key stakeholders (formally or informally) to identify team members who are struggling, as evidenced by work delays, missed deadlines, or other delivery failures
  • If needed, conduct remote training with team members to help them ramp up to new systems or processes

Preparing an extranet page as a single location for useful links, instructions, training, and other resources can make disseminating important information easier. As an example, one council Codify worked with created a central extranet on a memorable, locally significant URL that redirected staff to a secure SharePoint Online site which could be updated regularly.

 

Step #7: Ensure Your Website Can Support Additional Load

Especially if you anticipate website demand will increase during a disaster, take any steps necessary to maintain uptime and usability.

  • Consult with various departments to understand any new features or functionalities that will be rolled out as part of the remote work transition
  • Set up-time and page speed goals that are reasonable to your circumstances
  • Conduct load testing as needed to identify potential weaknesses before they result in downtime

In-housing websites on Azure, for example, can not only provide a resilient service, but add flexibility in scale through the upscaling of service on demand. Further, Azure has first-party support for most CMS products on Platform-as-a-Service (PaaS) technologies, so that IT won’t need dedicated support professionals for one-time instances of open source databases etc.

 

Step #8: Support the Transition of Critical Functions to Online or Phone Operations

Transitioning functions such as customer service centres to remote operations is likely to increase demand on your systems.

  • Document what exactly is being transitioned online, as well as the resources required to support them throughout periods of peak usage
  • Consider both security and usability when developing new online systems
  • Implement mechanisms for user error reporting in any new features developed
  • Determine how to prioritise these errors in your list of pending bug fixes, relative to other disaster-related needs

Consider the example of QLeave. As a state government statutory body, they provide portable long service leave to the construction, cleaning, and childcare workers of Queensland. In the early days of the COVID pandemic, QLeave provided a critical role to these workers who were directly affected as a result of the lockdown – even as lockdown measures forced them to run their entire call centre from softphones.

“Our mission of helping workers in Queensland’s contract cleaning industry access their long service leave payments became even more critical amidst the COVID-19 pandemic, as these workers were often directly affected by facilities lockdowns. Codify’s Azure implementation allowed QLeave to run our entire call centre from softphones, ensuring workers were protected and that our organisation’s security and service delivery weren’t compromised.”Robert O’Brien, IT Manager, QLeave

 

Step #9: Plan for Redundancy in Technical Support Functions in the Event of Higher-Than-Usual Levels of Absenteeism

IT processes that depend on a single person (or a few people) represent a potential point of failure if key stakeholders are out of the office unexpectedly.

  • List out critical IT processes, noting which staff members are trained to carry them out
  • Where only one person can support a given function, identify another team member who can provide backup support
  • Train newly identified backup personnel on new functions
  • Repeat the process regularly to ensure new IT processes are captured and backed up
  • As you’re able to, develop written documentation for each process for further redundancy

Though many organisations have, in the past, relied on hiring ‘unicorns’ with experience in the specific server, storage, and/or networking technologies they’d already adopted, demand for public cloud has driven Microsoft to redevelop its certification program to a role-based model. The result is that organisations should be able to easily find skilled talent with experience in the specific Azure resources they’re already using, due to significant growth in certified Azure professionals.

 

Step #10: Plan for the Continued, Safe Operation of Any Equipment or Facilities that Remain Open

Ensure any on-premise hardware or other systems remain operational, even if you aren’t able to staff facilities as usual.

  • Create a ‘minimum viable schedule’ that presents the lowest possible amount of risk to team members
  • If necessary, develop a regular sanitation schedule and provide PPE for on-site staff

Ultimately, this article isn’t intended to take the place of a formal DPR. Instead, we hope you can use it to fill any gaps you encounter in your existing plan, whilst also ensuring that you stay current on local news, policy updates, and other changes brought about by the fluid nature of disaster situations.

Want to learn more about how Microsoft’s Azure platform can support your backup and disaster recovery planning efforts? Reach out to the team at Codify for more.

Ready to connect with Codify to discuss your next cloud project?

I know what I want:

I don’t know what I need:

Ready to connect with Codify to discuss your next cloud project?

I know what I want:

I don't know what I need: