1300 CODIFY

Re-onboarding Azure Local VMs After Disaster Recovery

by | 18 Jun, 2026 | Blog

Disaster recovery planning usually focuses on one outcome: keeping workloads running when something goes wrong.

For organisations running Azure Local, failing over to Azure is a well-established approach.

Replication tools and migration workflows are designed to get systems back online quickly, often with minimal disruption.

But there’s a second phase that gets far less attention.

What happens when you need to bring everything back?

 

The scenario

An Azure Local cluster required a full rebuild following issues introduced during a December update. Workloads were failed over to Azure to maintain availability during this process. 

The update added one of the Compute Interfaces to the Management Intent on one Node, causing all network traffic to drop. The other nodes within the cluster were then trying to replicate this invalid configuration and consistently failing to do so. 

Once the new cluster was ready, workloads were migrated back using Azure Migrate. 

After migration:

  • Virtual machines were running
  • Azure resources were created
  • The environment appeared healthy

Despite this, the virtual machines were not functioning correctly as Azure Local workloads. We were unable to configure or install VM Extensions such as Azure Update Manager & Azure Monitor, severely limiting management, monitoring & the security of the Azure Local workloads. 

Additionally, after some time, the Virtual Machines became unlicensed. 

 

The root cause

The issue was related to how Azure Local VMs integrate with Azure Arc. 

During migration: 

  • A new cluster was built and workloads were brought back to this new cluster via Azure Migrate, rather than failing-back from ASR. This was because the old cluster no longer existed, removing the ability to use this feature as the link was broken. 
  • A new Azure Arc logical resource was created via Azure Migrate 

Inside the virtual machine: 

  • The existing in-guest agent was still configured for the previous cluster and its Azure ARC (Azure Local) Logical Resources 
  • The agent was not connected to the new logical resource 

This created a mismatch between the Azure resource and the in-guest configuration. 

Common signs included: 

  • Guest management not enabled and attempts to enable Guest Management fail 
  • Unable to configure Virtual Machine extensions, such as Azure Update Manager.  
  • Azure Local Workloads became unlicensed after some time. 

The virtual machine existed in Azure and could be managed at a basic level, but it was not fully onboarded as an Azure Local VM. 

 

Why this happens

Azure Local relies on an in-guest agent to manage the connection between the virtual machine and Azure Arc.

That agent stores configuration across several locations within each guest.

This configuration data includes:

  • Local file system data
  • Certificates
  • Service configuration
  • Registry entries

These configurations are tied to the original cluster and logical resource.

After a rebuild:

  • A new logical resource is created, with a new System Assigned Managed Identity. 
  • The virtual machine still holds references to the previous environment

This results in the agent connecting to an invalid or non-existent resource, using a non-existent System Assigned Managed Identity.


Disclaimer: This approach was developed while assisting a customer with Azure Local recovery and is based on our own troubleshooting experience. It reflects what we observed as at April 2026 and may not apply to all environments. No warranty is provided by Codify or Microsoft. 

Feel free to reach out if you find yourself in a similar situation. We’re more than happy to help.

 

Resolution approach

Fixing this issue requires re-onboarding the virtual machine to the correct Azure Arc resource.

At a high level, the process includes:

  • Removing the existing guest agent
  • Deleting associated configuration data and registry entries
  • Reinstalling the agent using the correct cluster configuration
  • Re-registering the virtual machine with the new logical resource
  • Validating that the agent is connected and guest management is enabled

Once complete:

  • The virtual machine appears correctly in Azure
  • Azure Arc integration functions as expected
  • Management features are available and VM extensions can be installed

 

Automation and manual process 

The re-onboarding process can be automated. 

In this environment, a script was used to: 

  • Remove the existing agent 
  • Install a new agent 
  • Re-enable the Azure Arc connection 

This approach is effective when working with multiple virtual machines. 

Manual understanding is still beneficial and can be required with specific Virtual Machine Configurations 

The process involves: 

  • Knowing where the agent stores configuration 
  • Understanding how the agent connects to Azure Arc 
  • Verifying the connection state inside Azure and within the VM 

Automation reduces effort but does not remove the need for troubleshooting.

 

Validation 

After re-onboarding, confirm that: 

  • Guest management is enabled and connected in Azure 
  • The agent is running without errors 
  • The virtual machine is associated with the correct logical resource 
  • Configuration reflects the new cluster, not the previous one 

Validation should be completed for each virtual machine. 

 

Lessons worth planning for 

This situation highlights a few things that are easy to underestimate. 

Failback needs as much planning as failover

  • Most DR strategies are heavily weighted towards getting out of an incident. In the event of a total Cluster Rebuild, bringing workloads back cleanly requires just as much thought. 

VM identity doesn’t reset automatically 

  • Even after migration to a new cluster, VMs retain configuration tied to their original environment. That context needs to be reset deliberately. 

Azure Local introduces additional layers

  • Compared to standard Azure VMs, Azure Local adds integration points that must be correctly re-established after a cluster rebuild. 

Visibility doesn’t equal correctness

  • A VM appearing healthy in Azure doesn’t guarantee it’s fully integrated or manageable. 

Automation is valuable, but not sufficient

  • Scripts can accelerate recovery, but troubleshooting still depends on understanding the underlying components.

The Azure Local cluster itself needs a recovery plan 

  • DR planning often focuses on workloads. Azure Local clusters also require a defined rebuild and re-onboarding approach to ensure workloads function correctly after recovery. 

 

When this applies 

This becomes particularly relevant for organisations that: 

  • Run Azure Local or Azure Stack HCI environments 
  • Use Azure as part of a disaster recovery strategy 
  • Rely on Azure Migrate for workload movement 
  • Regularly patch or update underlying infrastructure 
  • Need predictable and repeatable recovery processes 

These environments benefit from treating failback as a first-class part of the design, instead of just an afterthought. 

In practice, we used a repeatable script to remove the existing agent, clean residual configuration, and re-establish the Azure Arc connection. We’ve published a sanitised version here: 

View the re-onboarding script here

 

Summary

Migrating virtual machines back to Azure Local after a disaster recovery event can leave residual configuration inside the guest operating system. 

This configuration can prevent proper integration with Azure Arc. 

The issue is resolved by re-onboarding the virtual machine and ensuring the guest agent is aligned with the new environment. 

Planning for this step reduces recovery issues and avoids inconsistent management behaviour after failback. 

 

Need help with Azure Local recovery? 

If you’re planning or working through a similar scenario, it can help to have a clear approach to both failover and failback. We’re always happy to share what we’ve seen work in practice. 

Get in touch!  

Ready to connect with Codify to discuss your next cloud project?

I know what I want:

I don’t know what I need:

Ready to connect with Codify to discuss your next cloud project?

I know what I want:

I don't know what I need: