Reputation:
Ruby 1.9.3, net-ssh 2.9.2
I am working on a project, in which I need to diff the same directory (and its subdirectories) on two different servers (local and remote). From there, I need to copy the newest/recently modified files to the correct server, and delete from the remote if a file is not present in the local.
NOTE: I cannot use rsync. We are backing up Asterisk-related directories to GlusterFS. At thousands of files, rsync comparing local to the Gluster volume is very slow (when we need it under 1 minute).
Here is my current code. I am omitting my work for copying/removing files, as I want to take this one step at a time.
require 'thread'
require 'date'
require 'rubygems'
require 'net/ssh'
SERVERS = ['local17', 'development']
CLIENT = SERVERS[0]
CLIENT_PATH = '/home/hstevens/temp_gfs'
BRICK_PATH = '/export/hunter_test'
@files = {
SERVERS[0] => {},
SERVERS[1] => {}
}
def grab_filenames_and_dates(files, server)
files.reject { |x| File.directory? x }
files.each do |file|
name = `ls --full-time "#{file}" | awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print $0}'`.strip
date = `ls --full-time "#{file}" | awk '{print $6, $7, $8}'`.strip
@files[server][name] = DateTime.parse(date)
end
end
# Collect diff information on all servers
ls_threads = SERVERS.map do |server|
Thread.new do
if server == CLIENT
files = Dir.glob("#{CLIENT_PATH}/**/*")
grab_filenames_and_dates(files, server)
else
Net::SSH.start(server, 'hstevens') do |session|
files = session.exec!(%Q(ruby -e 'puts Dir.glob("#{BRICK_PATH}/**/*")')).split("\n")
grab_filenames_and_dates(files, server)
end
end
end
end
ls_threads.each(&:join)
When I run my program, it works for the local server (CLIENT
/local17
), but fails on the remote server. I tried debugging statements (printing pwd
to console`, and it appears that although the method is called inside the Net::SSH session block, it is acting on my local server.
ls: cannot access /export/hunter_test/sorttable.js: No such file or directory
ls: cannot access /export/hunter_test/sorttable.js: No such file or directory
./gluster_rsync.rb:36:in `parse': invalid date (ArgumentError)
from ./gluster_rsync.rb:36:in `block in grab_filenames_and_dates'
from ./gluster_rsync.rb:33:in `each'
from ./gluster_rsync.rb:33:in `grab_filenames_and_dates'
from ./gluster_rsync.rb:53:in `block (3 levels) in <main>'
from /usr/local/lib/ruby/gems/1.9.1/gems/net-ssh-2.9.2/lib/net/ssh.rb:215:in `start'
from ./gluster_rsync.rb:51:in `block (2 levels) in <main>'
How can I properly wrap a method call inside a Net::SSH session?
Upvotes: 2
Views: 1900
Reputation:
The accepted answer helped me arrive to the following solution. Knowing that session.exec!()
only runs shell commands, I decided to split the method (see question) into multiple steps within the SSH block.
Thread.new do
files = nil
Net::SSH.start(server, 'hstevens') do |session|
files = session.exec!(%Q(cd "#{BRICK_PATH}" ; ruby -e 'puts Dir.glob("**/*")')).split("\n")
files.delete_if { |x| File.directory? x }
files.each do |file|
name = session.exec!(%Q(ls --full-time "#{BRICK_PATH}/#{file}" | awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print $0}')).strip
date = session.exec!(%Q(ls --full-time "#{BRICK_PATH}/#{file}" | awk '{print $6, $7, $8}')).strip
@files[server][name] = DateTime.parse(date)
end
end
end
I do not know yet if this proves faster (need to run a benchmark), but it is definitely better than SSH-ing in several system()
calls.
Upvotes: 1
Reputation: 84172
Ruby code running inside the net::ssh block still runs on your computer (this includes methods that run commands like system
or backticks)
To execute a command on the remote server you need to use session.exec
or session.exec!
(the latter is blocking, the former requires you to run the ssh event loop). You can also open a channel explicitly and execute a command there - these methods are conscience wrappers.
There is no special support for running ruby remotely. You can of course use exec!
to run ruby on the other machine (assuming it is installed) but that's it
Upvotes: 1
Reputation: 3935
I'm 100% NOT trolling you ... but ... your synopsis is the very reason rsync was created. Moving files between servers with diff capability but efficiently.
IMO its a bit misguided to think you can do better than 20 years of battle tested C code. Which FWIW will execute much faster than ruby code. That is probably why so many are rallying to rsync as the solution.
Although rsync is single threaded... ask yourself why that is... just because you can multi-thread in ruby doesn't mean that you should. Its going to open a whole other spaghetti monster you'll soon find yourself tasked with "handling" duplicates or incorrect versions. See MongoDB discussions on atomicity. You won't even get close to atomic in ruby, so it WILL be an issue.
I would be sure to use a thread safe language if you want to go down that route, at the least jRuby. FWIW thread safety was one of the many reason's Jose created Elixir as he was exasperated by ruby not truly having it.
However IMO something is wrong with your approach and you need to take a few steps back and look at the problem holistically, e.g. maybe there is a similar solution to GlusterFS that can handle the dedup on the FS level, or maybe you need to handle file addition through an API or some sort of queuing system that will process the files in a sequential order. It may require a larger change than you're willing or can make though so if it were me, I would be hesitant to just cowboy code something up in ruby, because some developer is going to end up jumping into that code someday and facepalm instantly.
Multithread rsync not ruby
The only solution I can readily come up with is focus on making the rsync transfer faster.
Perhaps you can speed rsync up with threads instead
Or use this person's approach. This does seem to be an issue with GlusterFS but rsync with the proper flag/signals can do the differential sync better. Then your ruby scripts could pick up the files from the master source.
Upvotes: 2