How to use backups for e-discovery

The first rule of using backups for e-discovery is: don’t use backups for e-discovery – if you can avoid it. If you find yourself performing e-discovery on a regular basis, or are required by regulation to keep certain types of data (e.g. email) for a period of time (e.g. 7 years), then you should really look into an archive system.

But if you’re reading this article, you’ve probably been given an e-discovery request and do not have an archive system. Or maybe the e-discovery request specifically calls out extracting data from backup. Whatever the reason, if you’re faced with the prospect of performing e-discovery using backups as your source, you’re in for a treat, since backups were never really designed to do that.

This article will lay out the high-level process you will need in order to do this task yourself, and tell you about an alternative process that is simpler, quicker, and less expensive. Remember that an e-discovery request that takes too long can actually cost you the case. Check out the story of Coleman vs. Morgan Stanley to see just how bad things can be.

Retrieval via restore

Backups are designed to restore, meaning they are designed to restore a server, file, or database to the way it looked yesterday. But what you need is to extract, or retrieve, hundreds or thousands of emails or files over a long period of time. To do this, you’re going to need to do many restores and extract what you need from the restored data – and this is going to be quite the task. Here’s a summary of what you’re in for:

Duty to preserve (AKA legal hold)

If you have been given an e-discovery request, you have a legal duty to immediately preserve any media that may contain the requested records. This includes protecting it from your backup system’s automatic expiration process, which happens every single day as a normal process. Failure to properly preserve such data can cost your company any case requiring such data. As quickly as you can, you must figure out which tapes, disk backups, or cloud backups may contain the requested records, and ensure they do not get deleted by this automatic process. This is especially important for disk/cloud backups, as they are typically deleted when they expire. Ask your backup provider the best way to do this. This could be as simple as being able to put all backups from a particular host on legal hold until further notice. It could be as complicated as placing individual backup images on hold.

Points in time

You’re going to be performing dozens to hundreds of restores of the item you’re retrieving (e.g. Exchange, filesystem).  Each of those restores will be to a particular point in time.  You will need to recreate the conditions of that point in time for the restore to work, which include the operating system and application version.

Email application

Many (if not most) e-discovery requests involve email, and restoring email requires the application that reads that email (e.g. Exchange).  You must have a working copy of the appropriate version of that application, or the restore and subsequent extraction won’t work.

Client Platform/OS

Each version of the application you will need for the above step will require a certain version of the operating system.  It is sometimes possible to run an old version of an app on a newer version of the OS (and vice versa), but not always.

Backup application

Backup software also changes over time.  Really old backups can sometimes only be restored using versions of the backup software from that time, which may include the version of the backup agent for the app you are restoring.

Sufficient space

You must ensure there is sufficient space for the restore you have planned, as well as space to perform any extractions.

Tape or cloud retrieval

You need recall or download all off-site backups necessary to perform each restore.  This may involve retrieval fees from your offsite storage vendor (e.g. Iron Mountain) or download fees from your cloud vendor.

Restore

Once you have recreated the environment that a particular backup was taken in, you can restore it.  This will be an alternate-server restore (since you are not restoring back to the original server the backup was taken from), so you will need to become an expert in one of the most difficult types of restores.  Once you become familiar with the logistics of this part (after you’ve done a handful of them), this part will get a lot easier.

Extraction(s)

Once you have a working copy of the file or email system you need to extract from, you will need to perform one or more extractions to get the emails or files you are looking for.  For example, in Exchange you will perform a search and create a PST file for the result.  If the extraction contains many elements (e.g. multiple users’ emails),  you will need to perform multiple extractions, each of which will create its own archive that you will add to the collection of files.  There will likely be quite a bit of duplicate data between the files, making the subsequent culling step much harder.

Rinse and repeat

You must repeat each of the above steps for every point in time you have a backup for during the period of time the e-discovery is for.  For example, I once worked on a project that involved extracting all emails over a three-year period.  They had a weekly full backup of Exchange, so we performed 156 (52 x 3) restores of Exchange.

Culling

If you plan on sending the results to an e-discovery service, remember that they charge by the record.  If there is any way you merge duplicate emails or files, you will save your organization a lot of money.

The steps above are labor and hardware intensive. The e-discovery request I mentioned above (3 years of emails) involved 15 consultants working round the clock and cost millions of dollars and months of time, while also tying up servers and tape hardware, and requiring special arrangement with the backup software company to run multiple simultaneous copies of NetBackup. (Each needed a license.)

There is a better way.

Give it to S2|Data

S2|Data can directly extract emails and files from most common backup formats, creating a single archive that contains all emails that match the search parameters you specify, and optionally deleting any duplicate emails, leaving only one copy of each relevant email. You can also further cull this data using an online tool.

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

Join the discussion

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from this show