Reputation: 169
I need to cache an ftp folder locally in ruby. Right now I'm using ftp_sync to download the ftp folder but it's painfully slow, do you guys know any library that can download the folder files in parallel? Thanks!
Upvotes: 3
Views: 2589
Reputation: 160601
Take a look at Curb. It's a wrapper around Curl, and can do multiple connections in parallel.
This is a modified version of one of their examples:
require 'curb'
urls = %w[
http://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p286.tar.bz2
http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2
]
responses = {}
m = Curl::Multi.new
# add a few easy handles
urls.each do |url|
responses[url] = Curl::Easy.new(url)
puts "Queuing #{ url }..."
m.add(responses[url])
end
spinner_counter = 0
spinner = %w[ | / - \ ]
m.perform do
print 'Performing downloads ', spinner[spinner_counter], "\r"
spinner_counter = (spinner_counter + 1) % spinner.size
end
puts
urls.each do |url|
print "[#{ url } #{ responses[url].total_time } seconds] Saving #{ responses[url].body_str.size } bytes..."
File.open(File.basename(url), 'wb') { |fo| fo.write(responses[url].body_str) }
puts 'done.'
end
That'll pull in both the Ruby and Python source (which are pretty big so they'll take about a minute, depending on your internet connection and host). You won't see any files appear until the last block, where they get written out.
Upvotes: 1
Reputation: 35483
The syncftp gem may help you:
http://rubydoc.info/gems/syncftp/0.0.3/frames
Ruby has a decent built-in FTP library in case you want to roll your own:
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/ftp/rdoc/Net/FTP.html
To download files in parallel, you can use multiple threads with timeouts:
A great way to get parallel work done is Celluloid, the concurrent framework:
https://github.com/celluloid/celluloid
All that said, if the download speed is limited to your overall network bandwidth, then none of these approaches will help much.
To speed up the transfers in this case, be sure you're only downloading the information that's changed: new files and changed sections of existing files.
Segmented downloading can give massive speedups in some cases, such as downloaded big log files where only a small percentage of the file has changed, and the changes are all at the end of the file, and are all appends.
You can also consider shelling out to the command line. There are many tools that can help you with this. A good general-purpose one is "curl", which supports simple ranges for FTP files as well, for example you can get the first 100 bytes of a document using FTP like this:
curl -r 0-99 ftp://www.get.this/README
Are you open to other protocols besides FTP? Take a look at the "rsync" command, which is excellent for download synchronization. The rsync command has many optimizations to transfer just the changed data. For example rsync can sync a remote directory to a local directory like this:
rsync -auvC [email protected]:/remote/foo/ /local/foo/
Upvotes: 2