Ryan
Ryan

Reputation: 15270

How can I efficiently move many files to a new server?

I'm switching hosting providers and need to transfer millions of uploaded files to a new server. All of the files are in the same directory. Yes. You read that correctly. ;)

In the past I've done this:

  1. Zip all of the files from the source server
  2. scp the zip to the new server
  3. Unzip
  4. Move directory to appropriate location
    • for whatever reason my zips from step 1 always bring the path along with them and require me to mv.

The last time I did this it took about 4-5 days to complete and that was about 60% of what I have now.

I'm hoping for a better way. What do you suggest?

File structure is hashed. Something like this: AAAAAAAAAA.jpg - ZZZZZZZZZZ.txt

Here's one idea we're tossing around:

Split the zips into tons of mini-zips based on 3 letter prefixes. Something like:

AAAAAAAAAA.jpg - AAAZZZZZZZ.gif => AAA.zip

Theoretical Pros:

Theoretical Cons:

We've also thought about rsync and scp but worry about the expense of transferring each file manually. And since the remote server is empty I don't need to worry about what's already there.

What do you think? How would you do it?

(Yes, I'll be moving these to Amazon S3 eventually, and I'll just ship them a disk, but in the meantime, I need them up yesterday!)

Upvotes: 5

Views: 4584

Answers (3)

Matt Clark
Matt Clark

Reputation: 28629

You actually have multiple options, my favorite would be using rsync.

rsync [dir1] [dir2]

This command will actually compare the directories, and sync only the differences between them.

With this, I would be most likeley to use the following

rsync -z -e ssh [email protected]:/var/www/ /var/www/

-z Zip
-e Shell Command

You could also use SFTP, FTP via SSH.

Or even wget.

wget -rc ssh://[email protected]:/var/www/

Upvotes: 9

Nathan Moinvaziri
Nathan Moinvaziri

Reputation: 5638

What about using BitTorrent? It may not be as easy to setup, but once you have it going it should do exactly what you want. BitTorrent was developed to facilitate the transferring of larges files. You would need a client on the source machine and one on the destination machine. Create the metafile on the source machine. Copy it to the destination machine and load it up in your BitTorrent client. Manually enter in the IP to the source machine. As long as you have no firewalls blocking you, the transfer should start. Optionally you could zip up all the files first using no compression aka STORED compression and then transfer the zip using BitTorrent.

Upvotes: 0

TooLazyToLogIn
TooLazyToLogIn

Reputation: 284

I'm from the Linux/Unix world. I'd use tar to make a number of tar files each of a set size. E.g.:

tar -cML $MAXIMUM_FILE_SIZE_IN_KILOBYTES --file=${FILENAME}}_{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}.tar  ${THE_FILES}

I'd skip recompression unless your .txt files are huge. You won't get much mileage of out recompressing .jpeg files, and it will eat up a lot of CPU (and real) time.

I'd look into how your traffic shaping works. How many concurrent connections can you have? How much bandwidth per connection? How much total?

I've seen some interesting things with scp. Testing out a home network, scp gave much lower throughput than copying over a mounted shared smbfs filesystem. I'm not entirely clear why. Though that may be desirable if scp is verifying the copy and requesting retransmission on errors. (There is a very small probability of an error making it through in a packet transmitted over the internet. Without some sort of subsequent verification stage that's a real problem with large data sets. You might want to run md5 hashes...)

If this is a webserver, you could always just use wget. Though that seems highly inefficient...

Upvotes: 1

Related Questions