Reputation: 437
The code below handles a huge amount of data and I want to ask how can I use the multiprocessing module in Python for parallel processing in order to speed things up. Any help is appreciated
pats = []
for chunk in code_counter(patients, codes):
pats.append(chunk)
def code_counter(patients, codes):
for key, group in itertools.groupby(patients, key=operator.itemgetter('ID')):
group_codes = [item['CODE'] for item in group]
yield [group_codes.count(code) for code in codes]
Upvotes: 0
Views: 73
Reputation: 1275
I think your problem resides in the use of yield. I think you can't yield the data from different processes. I understood, that you use the yield cuz you can't load the data "inline" that would cause the ram to overload.
maybe you can take a look at the multiprocessing Queue http://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
i didn't really get what you are trying to do with your code, so i can't deliver a precise excample.
Upvotes: 1