Close
Please choose a department to chat with
Shopping Cart    0 Items ($0.00)  |  Login  |  Support

Data Deduplication

We Accept Purchase Orders
1-801-356-3823
live chat
USB 2.0 Hub, 4-port
$19.95 $12.95

Try our award- winning software
FastLynx 3.3

Like this page? Share it!
Bookmark and Share

Deduplication or single instance storage is a method used to optimize your data usage. The obvious benefit is in hard drive space, but it also can make a significant impact on network utilization, data backups, hard drive longevity and disaster recovery.

The simple explanation is that you have a lot of copies of the same file, say everyone in the company is sent the updated version of the employee handbook, instead of storing one copy of the handbook for each employee, only one copy is stored and pointers to send everyone to the same file.

There are a few deduplication options now that are limited to only backing up identical files, but most now do block level analysis, and can increase the efficiency much more. If your company uses a standard form for terms and conditions, but each employee adds in their name, phone extension and image of their signature, block level deduplication would recognize that the majority of those files are similar and only store a single instance of the bulk, while storing separately the blocks with the unique information. Or maybe you use a spreadsheet to track credit card expenses. When the file is updated, only the last block changes, and your deduplication solution can set just that block for backup and storage, and your hard drive doesn't have to re-write the whole file.

In these simple cases, the benefits are admittedly minimal. But if you are trying to manage a large database or a company-wide backup system the impact can be huge. (Think of all the operating system files that are the same on every computer) It is true that storage is cheap and always getting cheaper, however large scale deployments are not cheap, and if you are careful and efficient in your data usage, your deployments will be more scalable, more reliable, and more profitable.

Deduplication Options:

In smaller systems, you might be able to simply use deduplication software on your servers. The key point with server side software is that it can be resource intensive while deduplicating, so if you don't have any excess capacity in your servers CPU and RAM, look for an appliance or be prepared to upgrade your hardware. One major advantage to server side processing is that you don't send any duplicates for backup, which can alleviate a lot of network traffic. If bandwidth is your premium, (think remote office) server side might be the best option for you.

A more scalable solution is to use a deduplication appliance. A standalone machine that is built to scan its storage for multiple instances and simplify everything. These are powerful machines designed specifically to maximize storage utilization. Single instance storage appliances can do the deduplication "inline" or "post processing"--both have advantages and disadvantages. Inline processing will scan each file (block) before writing it to storage, if a duplicate exists, it only saves a pointer. That processing can create a bottleneck in how fast you can send information to the appliance but is very efficient with disk usage. Post processing needs to have a lot of capacity overhead, with enough to keep the deduplicated backup, plus whatever new data it receives before running the deduplication. For incremental backups, that might not be a big deal, but if you do full backups of your system, you can tie up a lot of memory.

It is very likely that you will find a mixture of these systems in your stage solution. Maybe you will prefer to do file-level deduplication on the server, which has a minimal CPU impact while alleviating some of the stress on your network. Then you send that to a deduplication appliance that does inline processing and keeps a full backup of your system for 30 days before it is archived and sent for off-site storage. Each case is going to be a little different, but with the amount of data that business use these days, deduplication is quickly becoming a necessity