Ivan Monteiro
Ivan Monteiro

Reputation: 169

How to recursively download FTP folder in parallel in Ruby?

I need to cache an ftp folder locally in ruby. Right now I'm using ftp_sync to download the ftp folder but it's painfully slow, do you guys know any library that can download the folder files in parallel? Thanks!

Upvotes: 3

Views: 2589

Answers (2)

the Tin Man
the Tin Man

Reputation: 160601

Take a look at Curb. It's a wrapper around Curl, and can do multiple connections in parallel.

This is a modified version of one of their examples:

require 'curb'

urls = %w[
  http://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p286.tar.bz2
  http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2
]

responses = {}
m = Curl::Multi.new

# add a few easy handles
urls.each do |url|
  responses[url] = Curl::Easy.new(url)
  puts "Queuing #{ url }..."
  m.add(responses[url])
end

spinner_counter = 0
spinner = %w[ | / - \ ]
m.perform do
  print 'Performing downloads ', spinner[spinner_counter], "\r"
  spinner_counter = (spinner_counter + 1) % spinner.size
end
puts

urls.each do |url|
  print "[#{ url } #{ responses[url].total_time } seconds] Saving #{ responses[url].body_str.size } bytes..."
  File.open(File.basename(url), 'wb') { |fo| fo.write(responses[url].body_str) }
  puts 'done.'
end

That'll pull in both the Ruby and Python source (which are pretty big so they'll take about a minute, depending on your internet connection and host). You won't see any files appear until the last block, where they get written out.

Upvotes: 1

joelparkerhenderson
joelparkerhenderson

Reputation: 35483

The syncftp gem may help you:

http://rubydoc.info/gems/syncftp/0.0.3/frames

Ruby has a decent built-in FTP library in case you want to roll your own:

http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/ftp/rdoc/Net/FTP.html

To download files in parallel, you can use multiple threads with timeouts:

Ruby Net::FTP Timeout Threads

A great way to get parallel work done is Celluloid, the concurrent framework:

https://github.com/celluloid/celluloid

All that said, if the download speed is limited to your overall network bandwidth, then none of these approaches will help much.

To speed up the transfers in this case, be sure you're only downloading the information that's changed: new files and changed sections of existing files.

Segmented downloading can give massive speedups in some cases, such as downloaded big log files where only a small percentage of the file has changed, and the changes are all at the end of the file, and are all appends.

You can also consider shelling out to the command line. There are many tools that can help you with this. A good general-purpose one is "curl", which supports simple ranges for FTP files as well, for example you can get the first 100 bytes of a document using FTP like this:

curl -r 0-99 ftp://www.get.this/README

Are you open to other protocols besides FTP? Take a look at the "rsync" command, which is excellent for download synchronization. The rsync command has many optimizations to transfer just the changed data. For example rsync can sync a remote directory to a local directory like this:

rsync -auvC [email protected]:/remote/foo/ /local/foo/ 

Upvotes: 2

Related Questions