Things to think of before restoring a whole VM with Veeam

I recently went through something that taught me a bit about doing a full VM restore. I will admit this use case is unique, as I use Veeam in the lab to really support Veeam. Most people don’t have a “Veeam lab” – they may protect a lab with Veeam, or of course production workloads with Veeam, but rarely Veeam protecting Veeam.

What happened here for me is that I restored, and it didn’t just work. The reason is for a number of things which I will explain in this blog post (which I’ll chalk up as sloppy administration: Fault me); but at least one of these things someone may learn from. In the end, I (of course!) did not lose and data. But I did have an unnaturally long restore time as I had to do many of these steps over and over.

Scenario:             I have a few upcoming Veeam features that are best demoed using nested ESXi. I have them running as a shown in the figure below at the time I backed them up:

b1

Now there are few things to note with this diagram, so I’ve made a legend here to help explain it:

b2

You may wonder why I have 2 instances of Veeam Backup & Replication 9.5 Update 2 (the generally available latest edition with updates). This is for the one on the left to show features, the current available features. And the instance on the right is there to back EVERYTHING up. The nature of a lab is subject to changes and it is nice to have a completely out-of-band way to backup the lab. Note it is also in a different cluster. When POD2 was built, it was very temporary and hastily built (re: point above of sloppy administrator – that’s me). Additionally, I was using a standard vSwitch (vs. Distributed) and was only using local storage of VMFS datastores. I have had the same vCenter thru it all for POD2 – so key things like the VM object ID have been retained. I also on the Veeam server on the right run a few jobs called “Safehouse”. I call them Safehouse as I have built a few jobs for key parts of labs to be backed up and restored. One Safehouse backup job was to back up and protect a complete infrastructure to support our upcoming Veeam CDP feature that is built on vSphere APIs for I/O Filtering or VAIO. To show this for now, it’s much easier to use a nested ESXi host; and I have a Safehouse job to back it and it’s Veeam Backup & Replicaiton Server up (one of the “vNext Betas and Test Stuff” VMs).

I run the Safehouse jobs based on milestones and I have a separate job on the Veeam server on the right to back up the entire vCenter on a schedule (maybe once a week, it’s a lab). The milestones are known working good configuration for whatever it is, in this case Veeam CDP. This was marked in a Safehouse job from May, right before VeeamON.

Since then, the lab is indeed a lab, and POD2 took on some really good changes. The evangelists and I have gone all-in on VMware vSAN, Anthony has set up a nice vNetwork Distributed Switch and we have a really good platform to work with. The new cluster looks something like this:

b3

But I needed to restore one of the Safehouse jobs, and I figured – I have a backup; should be fine to restore! Well, check the logs and see what I had to go through:

b4

I had a number of failures on the restores; then a number of successful restores that I had to re-do. What happened? Well this is the point of the blogpost on things to think about before a restore! I lost no data in the end, but it was frustrating – in a fun way – because A) I learned a few things and B) I get to write this blogpost and let you all know so you don’t have this problem. Here is the short list of what went wrong:

  • Since the Safehouse backup job, we accidentally (separate discussion) pushed out Veeam Agent for Windows to every computer in the Domain (it’s a lab remember). This put different Veeam components in play with the Generally Available 9.5 Update 2. Takeaway is don’t push the agent out to systems with other Veeam components and keep the revisions in mind.
  • Since the Safehouse backup job, the networking and storage has all changed. I found a bizarre issue that prohibited a nested ESXi running on a VMFS datastore being moved to a vSAN datastore via a Veeam restore. Now I need to go thru this a bit more and share with R&D, but the short answer is that this nested ESXi has had its local setup modified for the Veeam IO filter (the GA version of Veeam CDP will have this done in a proper installer, I installed the components by hand). But I found out that it somehow redetected the local storage on vSAN and wouldn’t recognize its local datastore there. I put it on an iSCSI VMFS volume, boots up fine.
  • I use a lot of DHCP and vCenter doesn’t like new IPs. I should for this lab have very long lease times, but we didn’t. When the vCenter came up with a new IP, it had to be either reconfigured with local host files and such; or you can just cheat and add a reservation for the old IP and its MAC address if it has been off for a while and gets a new address (but tricky if the IP is claimed). But again this proves that DNS is still very important to vCenter.

Each one of these issues caused me to re-start the restore a few times. The components error was pretty clear since it failed right away and told me that; but the other two issues took a bit of digging. I got lazy and tried going back to an older restore point; but the same behavior happened. At that point, I was committed to getting the root cause of what is going on.

An additional thing to consider is that if a VMware cluster takes on some serious changes (like POD2 became the MEGAPOD); you should test a few restores to make sure you don’t have any surprises; and also for fiddlesome applications like vCenter – still never use DHCP. While it is a lab function here, I did learn a bunch and hopefully you can as well.

Leave a comment