When a client asked me to back up two Synology boxes totaling “about 40TB,” I had no idea I was about to experience the backup from hell. What followed was months of battling failing tape drives, kilobyte-per-second transfer speeds, and the discovery of directories containing millions of files – a perfect storm of everything that could possibly go wrong with a backup.

This blog post summarizes the main points of my latest podcast episode. If you’d like, you can listen to it or watch it at https://www.backupwrapup.com/)
Initial Assessment: Where Everything Started Going Wrong
The first mistake was trusting the client’s estimate of 40TB. In my defense, getting accurate sizing information would have taken days due to the extreme number of files involved. A simple ‘du’ command would hang for days, and ‘find’ commands were even worse. What we actually had was closer to 400TB – a discovery that completely invalidated our initial backup architecture.
I started with NetBackup and tape – a solid combination that’s worked for decades. The setup included a Windows server and tape library perfectly sized for 40TB. Everything looked good on paper. But then reality hit, and it hit hard.
The First Signs of the Backup from Hell
The backup speeds were the first red flag – we’re talking 3.5 kilobytes per second. Yes, kilobytes. Even when multiplexing 99 backup jobs simultaneously (NetBackup’s maximum), we only achieved 30-40 megabytes per second aggregate throughput. The math was brutal: at these speeds, we were looking at months or even years to complete the backup.
Tape Drives: Not Your Friend for Marathon Sessions
The half-height LTO drives weren’t designed for continuous operation over months. They wanted nice, fast data streams and regular breaks. Instead, I was giving them a trickle of data 24/7. The result? Constant shoe-shining and drive failures. We went through multiple drives, endless cleaning cycles, and countless reboots. When a drive failed after weeks of backup, we had to start those jobs over – pure torture.
The Million File Problem Reveals Itself
As I dug deeper, I discovered the true villain: directories containing millions of files. One directory alone had 99 million files. This is when I remembered the dreaded “million file problem” from my early backup days. But this was worse – we were dealing with this over SMB, which requires multiple round-trip conversations for each file. The network wasn’t saturated in terms of bandwidth; it was drowning in latency.
Breaking Down the Technical Nightmare
The real issue wasn’t disk I/O – during all those hundreds of simultaneous backups, I/O wait never exceeded 4%. The Synology boxes weren’t overtaxed either – no high CPU, no RAM issues. It was purely the SMB protocol overhead combined with an astronomical number of files. Each file required multiple network round-trips, creating a cascade of latency that brought everything to a crawl.
Finding Solutions Through Trial and Error
After several failed approaches, I finally developed a multi-pronged strategy:
- Switched from tape to disk backup – expensive but necessary
- Split the backup into 2,400 separate policies for better management
- Used local tar backups for the 20 most problematic directories
- Ran multiple tar backups simultaneously
- Created extensive scripts to manage the whole process
The Scripting Salvation
Cygwin became both my best friend and occasional nemesis. I wrote scripts to monitor job progress, balance loads across target filers, and handle the constant Windows/Unix path translation issues. The scripting challenges were real – dealing with backslash versus forward slash issues, managing deep directory structures, and handling path translations between Windows and Unix conventions.
What Finally Worked
The breakthrough came when I realized that backing up locally using tar and then transferring the archives was infinitely faster than backing up over SMB. What had taken 60+ days with negligible progress could now complete in about a day. The difference was staggering.
Lessons from the Backup from Hell
- Never trust data size estimates – verify yourself if possible
- Always check file counts, not just data volume
- Watch out for applications that create millions of files
- Consider local backup options for extreme file count scenarios
- Keep your scripting skills sharp – they’re your last line of defense
- Standard backup tools may fail in extreme scenarios – have backup plans
- Document everything – you’ll need it when explaining why a “simple” backup took months
The Next Time Around
If I had to do it again (please, no), I’d spend more time upfront analyzing the environment. Running find and du commands might take days, but that’s better than discovering these issues mid-backup. I’d also push harder for local backup access rather than relying on network protocols for extreme file count scenarios.
Remember: in backup, it’s not just about the total data size. The number of files can break you faster than the number of terabytes. And when someone says “it’s about 40TB,” make sure to verify that yourself – preferably before committing to any timelines.
Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at S2|DATA, which helps companies manage their legacy data

