Reputation: 9183
I've been using GNU parallel for a while, mostly to grep large files or run the same command for various arguments when each command/arg instance is slow and needs to be spread out across cores/hosts.
One thing which would be great to do across multiple cores and hosts as well would be to find a file on a large directory subtree. For example, something like this:
find /some/path -name 'regex'
will take a very long time if /some/path
contains many files and other directories with many files. I'm not sure if this is as easy to speed up. For example:
ls -R -1 /some/path | parallel --profile manyhosts --pipe egrep regex
something like that comes to mind but ls
would be very slow to come up with the files to search. What's a good way then to speed up such a find?
Upvotes: 3
Views: 6386
Reputation: 123460
If you have N hundred immediate subdirs, you can use:
parallel --gnu -n 10 find {} -name 'regex' ::: *
to run find
in parallel on each of them, ten at a time.
Note however that listing a directory recursively like this is an IO bound task, and the speedup you can get will depend on the backing medium. On a hard disk drive, it'll probably just be slower (if testing, beware disk caching).
Upvotes: 5