GNU Parallel: Argument list too long when calling function

Question

I created a script to verify a (big) number of items and it was doing the verification in a serial way (one after the other) with the end result of the script taking about 9 hours to complete. Looking around about how to improve this, I found GNU parallel but I'm having problems making it work.

The list of items is in a text file so I was doing the following:

readarray items < ${ALL_ITEMS}
export -f process_item
parallel process_item ::: "${items[@]}"

Problem is, I receive an error:

GNU parallel: Argument list too long

I understand by looking at similar posts 1, 2, 3 that this is more a Linux limitation than a GNU parallel one. From the answers to those posts I also tried to extrapolate a workaround by piping the items to head but the result is that only a few items (the parameter passed to head) are processed.

I have been able to make it work using xargs:

cat "${ALL_ITEMS}" | xargs -n 1 -P ${THREADS} -I {} bash -c 'process_item "$@"' _ {}

but I've seen GNU parallel has other nice features I'd like to use.

Any idea how to make this work with GNU parallel? By the way, the number of items is about 2.5 million and growing every day (the script run as a cron job).

Thanks

ArturFH · Accepted Answer

From man parallel:

parallel [options] [command [arguments]] < list_of_arguments

So:

export -f process_item
parallel process_item < ${ALL_ITEMS}

probably does what you want.

GNU Parallel: Argument list too long when calling function

Answers (2)

Related Questions