Friday, February 11, 2011

More Efficient Storage of Virtual Machine Backups

Here's a little trick I use to save substantial amounts of disk space. 

For testing our software, we heavily rely on virtual machines, mostly running various versions of Windows - we use VMware (both VMware Workstation on PC and VMware Fusion on Mac), Parallels, and VirtualBox extensively. 

VirtualBox is my current favorite - it's come a long way since the earlier versions, and to me it's now on a par with VMware and Parallels. Furthermore, it has a number of very handy features that the other two lack, and above all, it's free!

The nice thing with virtual machines that we can easily 'roll back' to a known state before starting a test session. All three of the tools mentioned above have a 'snapshotting' feature that makes 'rolling back' easy.

However, snapshots of virtual machines come at a cost: the more snapshots you pile up onto a single virtual machine, the slower that virtual machine becomes. 

To avoid the performance penalty, we instead often opt for a more primitive backup strategy: we regularly make compressed copies of a 'known good' virtual machine, and restore them as needed.

Here's the trick I devised to make the compressed copies substantially smaller without loss of data: 

1) Get rid of all snapshots before backing up (so the virtual hard disk becomes a single, large file).

2) Optionally, clear any temp files and so on.

3) Use a 'wiper' tool to erase any unused disk space in the virtual machine before compressing. 

If you don't do this, any unused disk space on the virtual hard disks will contain more or less random bit patterns, which don't compress very well.

Yet, if you zero out any unused disk space, the unused disk space on the virtual drive compresses extremely well. 

For example, I just have made a compressed backup of a heavily used virtual machine twice - first without erasing the unused space and then again after erasing unused space. The compressed file size fell from close to 80GB in size to 50GB in size. How's that for savings?

I mostly use a Windows-based tool called Eraser - it's free and open source.

The main caveat is that you have to make it erase any unused space with zeroes. 

Eraser's main purpose is to make erased data irretrievable for security-purposes, and by default it will fill up any unused disk space with random bit patterns, but that is not what we're after here.

To make Eraser work for me I had to add a 'custom' bit pattern first (which is simply a zero).

There are a few variants of Eraser available - don't use the most recent 6.x version that apparently won't allow you to fill unused space with zeroes, but instead insist on using random bit patterns. That is useless for our purpose. 

Instead, you have to download the older version (I used 5.8.8) which allows you to define your own erasure patterns:


After installing, you need to select the Edit - Preferences - Erasing... menu. Then add a new method, call it 'Zero' and set it to use a zero as the bit pattern. 

Then erase the unused space with this pattern. Then, after shutting down, you can compress the virtual machine.



No comments: