Reputation: 2409
I want to do batch processing of files on multiple cores. I have following scenario:
How do I achieve this? I am confused by multiprocessing.Pool()
, multiprocessing.Process()
and various other options.
Thanks.
Upvotes: 5
Views: 8999
Reputation: 365817
Although Jared's answer is great, I personally would use a ProcessPoolExecutor
from the futures
module, and not even worry about multiprocessing
:
with ProcessPoolExecutor(max_workers=4) as executor:
result = sum(executor.map(process_file, files))
When it gets a little more complicated, the future
object, or futures.as_completed
, can be really nifty compared to the multiprocessing
equivalents. When it gets a lot more complicated, multiprocessing
is a whole lot more flexible and powerful. But when it's this trivial, really, it's almost hard to tell the difference.
Upvotes: 2
Reputation: 298256
It's pretty simple.
from multiprocessing import Pool
def process_file(filename):
return filename
if __name__ == '__main__':
pool = Pool()
files = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
results = pool.imap(process_file, files)
for result in results:
print result
Pool
automatically defaults to the number of processor cores that you have. Also, make sure that your processing function is importable from the file and that your multiprocessing code is inside of the if __name__ == '__main__':
. If not, you'll make a fork bomb and lock up your computer.
Upvotes: 3
Reputation: 26397
In order to demonstrate Pool
I'm going to suppose your working function, which consumes a filename and produces a number, is named work
and that the 20 files are labeled 1.txt
,... 20.txt
. One way to set this up would be as follows,
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(work, ("%d.txt"%n for n in xrange(1,21)))
print sum(result.get())
This method will do the work of steps 3 and 4 for you.
Upvotes: 7