Reputation: 629
I've got two main file servers and a big backup server, but someone has misorganized the backup server over some time now, and I need to check to make sure there are no files on the backup server that aren't on the main servers.
So I thought I'd write some quick code in Ruby to do so, which just uses a list of all files on each drive (found using File.glob) and checks for the existence of files on the main drives with File.size and File.basename.
Problem is it takes a while!! Each file between the main drives compared to the backup drive takes ~0.8s each, and given a drive with hundreds of thousands of files, this isn't going to work.
Any suggestions? I'm assuming my way is very inefficient.
Upvotes: 1
Views: 45
Reputation: 10215
Dir.glob
returns an Array
, so you'll end up needing to scan the full list of files for each file you're searching for. If you've got 100,000 files, that means you'll be doing 100,000^2 operations. You might speed things up quite a bit by instead incorporating a Set
, which has constant time access, reducing the workload to 100,000 operations. You can try something like this:
require 'set'
files_to_search = Set.new(Dir.glob('/that/path/**/*'))
files_to_search.include?('foo')
You might also be running into other constraints, however, such as memory, or the fact that Ruby isn't comparatively all that fast, so if Set
doesn't do the trick, you might want to try something using a shell tool. Michał Młoźniak's rsync
solution might do the trick, or you could probably come up with a handful of other ways to patch together shell commands and get the information you're looking for. You could check out diff
for example, perhaps paired with find
.
Upvotes: 0
Reputation: 5556
Forget ruby, just read manual for rsync
command. You can use dry-run
or other mix of options to just compare both main directories without copying files. It will be much faster, in terms of execution and time spent on making this work.
Upvotes: 2