How to recursively download FTP folder in parallel in Ruby?

Question

I need to cache an ftp folder locally in ruby. Right now I'm using ftp_sync to download the ftp folder but it's painfully slow, do you guys know any library that can download the folder files in parallel? Thanks!

joelparkerhenderson · Accepted Answer

The syncftp gem may help you:

http://rubydoc.info/gems/syncftp/0.0.3/frames

Ruby has a decent built-in FTP library in case you want to roll your own:

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/ftp/rdoc/Net/FTP.html

To download files in parallel, you can use multiple threads with timeouts:

Ruby Net::FTP Timeout Threads

A great way to get parallel work done is Celluloid, the concurrent framework:

https://github.com/celluloid/celluloid

All that said, if the download speed is limited to your overall network bandwidth, then none of these approaches will help much.

To speed up the transfers in this case, be sure you're only downloading the information that's changed: new files and changed sections of existing files.

Segmented downloading can give massive speedups in some cases, such as downloaded big log files where only a small percentage of the file has changed, and the changes are all at the end of the file, and are all appends.

You can also consider shelling out to the command line. There are many tools that can help you with this. A good general-purpose one is "curl", which supports simple ranges for FTP files as well, for example you can get the first 100 bytes of a document using FTP like this:

curl -r 0-99 ftp://www.get.this/README

Are you open to other protocols besides FTP? Take a look at the "rsync" command, which is excellent for download synchronization. The rsync command has many optimizations to transfer just the changed data. For example rsync can sync a remote directory to a local directory like this:

rsync -auvC me@my.com:/remote/foo/ /local/foo/

How to recursively download FTP folder in parallel in Ruby?

Answers (2)

Related Questions