Reputation:
I have this bash code:
#!/bin/sh
for f in /path/*.html
do
python3 main.py $f &
done
the /path/*.html
contains over 6000 file, now what I want to do is to execute simultaneously the Python function on first 100 files and when it's done it goes and run the other 100 and so on.
Upvotes: 1
Views: 1069
Reputation: 531095
Given that you don't want to keep exactly 100 processes running at a time (or as close as possible), you can just keep a counter to let you know when to wait for the existing processes to complete:
i=0
for f in /path/*.html; do
python3 main.py "$f" &
i=$((i + 1))
if [ "$i" -eq 100 ]; then
wait
i=0
fi
done
wait # In case the loop exits before i==100
Upvotes: 1
Reputation: 22262
You could certainly put the multi-processing inside python
using threading, multi-processing or with a wrapper like dask
.
If you want the command line repetition, I highly recommend gnu parallel for running one command with lots of files/arguments. parallel
is generally considered a more modern and powerful replacement for xargs
.
parallel --eta python3 main.py {} ::: /path/*.html
To get it to act on 100 files at once, use the -n
parameter:
parallel -n 100 --eta python3 main.py {} ::: /path/*.html
If you have too many arguments, then use find to generate the list and pass it in via stdin
:
find /path -name \*.html | parallel -n 100 --eta python3 main.py {}
Upvotes: 2
Reputation: 52344
Assuming a GNU userland:
find /path/ -maxdepth 1 -name "*.html" -print0 | xargs -0 -P100 -n1 python3 main.py
will launch 100 instances of the python script (And start new ones when they exit, until all the files have been processed).
Upvotes: 2