Arsen Zahray
Arsen Zahray

Reputation: 25287

How do I parallelize archiving of large number of directories?

Here's what I'm doing now:

find "./$compressed_dir_name/" -mindepth 2 -maxdepth 2 -type d| while read file; do archive_compressed "$directory"; done

That works fine, but I'd like to make it work faster by executing 3 threads of archive_compressed "$directory".

I could write something like while read file; do archive_compressed "$directory" &; done, but there are several thousand directories to process and I don't think that starting so many processes is a good idea.

Instead I'd like to limit it to 2-3 parallel processes at any moment in time

How do I do that?

Upvotes: 1

Views: 53

Answers (1)

Nate Eldredge
Nate Eldredge

Reputation: 58032

Use xargs with the -P option:

find "./$compressed_dir_name/" -mindepth 2 -maxdepth 2 -type d | xargs -P3 -n1 archive_compressed

The -n1 says that each invocation of the command archive_compressed should be passed 1 of the arguments from standard input, and -P3 says to run 3 processes at a time.

If you want to safely handle pathnames that may contain spaces or newlines, use instead

find "./$compressed_dir_name/" -mindepth 2 -maxdepth 2 -type d -print0 | xargs --null -P3 -n1 archive_compressed

If you need fancier options, look into parallel.

Note that if your task is disk-bound instead of CPU-bound, running multiple processes in parallel will probably make it slower.

Upvotes: 2

Related Questions