Reputation: 15270
I'm switching hosting providers and need to transfer millions of uploaded files to a new server. All of the files are in the same directory. Yes. You read that correctly. ;)
In the past I've done this:
scp
the zip to the new serverThe last time I did this it took about 4-5 days to complete and that was about 60% of what I have now.
I'm hoping for a better way. What do you suggest?
File structure is hashed. Something like this: AAAAAAAAAA.jpg
- ZZZZZZZZZZ.txt
Here's one idea we're tossing around:
Split the zips into tons of mini-zips based on 3 letter prefixes. Something like:
AAAAAAAAAA.jpg - AAAZZZZZZZ.gif => AAA.zip
Theoretical Pros:
Theoretical Cons:
AAA*
), perhaps offset by running many zip threads at once, using all CPUs instead of only one.We've also thought about rsync and scp but worry about the expense of transferring each file manually. And since the remote server is empty I don't need to worry about what's already there.
What do you think? How would you do it?
(Yes, I'll be moving these to Amazon S3 eventually, and I'll just ship them a disk, but in the meantime, I need them up yesterday!)
Upvotes: 5
Views: 4584
Reputation: 28629
You actually have multiple options, my favorite would be using rsync
.
rsync [dir1] [dir2]
This command will actually compare the directories, and sync only the differences between them.
With this, I would be most likeley to use the following
rsync -z -e ssh [email protected]:/var/www/ /var/www/
-z Zip
-e Shell Command
You could also use SFTP, FTP via SSH.
Or even wget
.
wget -rc ssh://[email protected]:/var/www/
Upvotes: 9
Reputation: 5638
What about using BitTorrent? It may not be as easy to setup, but once you have it going it should do exactly what you want. BitTorrent was developed to facilitate the transferring of larges files. You would need a client on the source machine and one on the destination machine. Create the metafile on the source machine. Copy it to the destination machine and load it up in your BitTorrent client. Manually enter in the IP to the source machine. As long as you have no firewalls blocking you, the transfer should start. Optionally you could zip up all the files first using no compression aka STORED compression and then transfer the zip using BitTorrent.
Upvotes: 0
Reputation: 284
I'm from the Linux/Unix world. I'd use tar to make a number of tar files each of a set size. E.g.:
tar -cML $MAXIMUM_FILE_SIZE_IN_KILOBYTES --file=${FILENAME}}_{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}.tar ${THE_FILES}
I'd skip recompression unless your .txt files are huge. You won't get much mileage of out recompressing .jpeg files, and it will eat up a lot of CPU (and real) time.
I'd look into how your traffic shaping works. How many concurrent connections can you have? How much bandwidth per connection? How much total?
I've seen some interesting things with scp. Testing out a home network, scp gave much lower throughput than copying over a mounted shared smbfs filesystem. I'm not entirely clear why. Though that may be desirable if scp is verifying the copy and requesting retransmission on errors. (There is a very small probability of an error making it through in a packet transmitted over the internet. Without some sort of subsequent verification stage that's a real problem with large data sets. You might want to run md5 hashes...)
If this is a webserver, you could always just use wget. Though that seems highly inefficient...
Upvotes: 1