Reputation: 8163
I have python script run.py:
def do(i):
# doing something with i, that takes time
start_i = sys.argv[1]
end_i = sys.argv[2]
for i in range(start_i, end_i):
do(i)
Then I run this script:
python run.py 0 1000000
After 30 minutes script is completed. But, it's too long for me.
So, I create bash script run.sh:
python run.py 0 200000 &
python run.py 200000 400000 &
python run.py 400000 600000 &
python run.py 600000 800000 &
python run.py 800000 1000000
Then I run this script:
bash run.sh
After 6 minutes script is completed. Rather good. I'm happy.
But I think, there is another way to solve the problem (without creating bash script), isn't there?
Upvotes: 6
Views: 4186
Reputation: 78660
You're looking for the multiprocessing package, and especially the Pool
class:
from multiprocessing import Pool
p = Pool(5) # like in your example, running five separate processes
p.map(do, range(start_i, end_i))
Besides consolidating this into a single command, this has other advantages over your approach of calling python run.py 0 200000 &
etc. If some processes take longer than others (and therefore, python run.py 0 200000
might finish before the others), this will make sure all 5 threads keep working until all of them are done.
Note that depending on your computer's architecture, running too many processes at the same time might slow them all down (for starters, it depends on how many cores your processor has, as well as what else you are running at the same time).
Upvotes: 10
Reputation: 49920
You could have your python program create the independent processes, instead of bash doing it, but that's not much different. What is it about your solution that you find deficient?
Upvotes: 0