zachguo
zachguo

Reputation: 6756

GNU Parallel, too many input files, Argument list too long

I run a command like this on my macbook, using GNU Parallel:

parallel "sample operation" ::: samplefolder/*.txt

The problem is that I have 20,000 txt files in the samplefolder, which cause a Argument list too long error.

And there's no such a problem when I tried run the same script on an ubuntu machine.

I tried googling and reading some man files, but no luck. How can I solve this problem?

Thanks!

Upvotes: 8

Views: 4921

Answers (4)

Florian Castellane
Florian Castellane

Reputation: 1227

Handle that operation in smaller batches using -N, and pipe the input file list rather than giving it on the command line.

For example, expanding on ArtemB's answer, to process in batches of 16 files (warning, this will break with paths containing newlines):

find samplefolder -type f -name "*.txt" | parallel -N16 "sample operation" {}

To tailor the maximum number of arguments you can check getconf ARG_MAX in your environment. For example:

# ~$> getconf ARG_MAX
2097152

given that paths on *nix can typically be 4096 characters, that leaves me free to put 2097152/4096=512 file paths on the command line (excluding the "sample operation" command itself of course).

So something like

find samplefolder -name "*.txt" | parallel -N500 "sample operation" {}

would let me process in batches of 500. Of course, depending on what tool you are running, you may want to experiment and optimize the batch size for speed.

Upvotes: 1

Zaikun Xu
Zaikun Xu

Reputation: 1473

just make that sample operation a bash file : find samplefolder -name *.txt -print0 | xargs -P 8 -n 1 -0 ./run.sh

Upvotes: 0

ArtemB
ArtemB

Reputation: 3632

Here's how you can deal with this on a typical UNIX box (I assume OSX has find and xargs too):

# find samplefolder -name \*.txt -print0 | xargs -P 8 -n 1 -0 sample operation

Find will print all .txt file names in samplefolder separated by a NUL character. xargs in turn will read this NUL-separated list (-0) and for each N files (-n1 -- for each file in this case) will launch sample operation path/file.txt with up to 8 (-P8) of them in parallel.

Upvotes: 2

Ole Tange
Ole Tange

Reputation: 33748

Try:

ls samplefolder | grep \.txt | parallel "sample operation samplefolder/{}" 

Upvotes: 4

Related Questions