Reputation: 203
I'm using Pool.map
for a scoring procedure:
The results are independent.
I'm just wondering if I can avoid the memory demand. At first it seems that every array goes into python and then the 2 and 3 are proceed. Anyway I have a speed improvement.
#data src and sink is in mongodb#
def scoring(some_arguments):
### some stuff and finally persist ###
collection.update({uid:_uid},{'$set':res_profile},upsert=True)
cursor = tracking.find(timeout=False)
score_proc_pool = Pool(options.cores)
#finaly I use a wrapper so I have only the document as input for map
score_proc_pool.map(scoring_wrapper,cursor,chunksize=10000)
Am I doing something wrong or is there a better way with python for this purpose?
Upvotes: 1
Views: 2718
Reputation: 69042
The map
functions of a Pool
internally convert the iterable to a list if it doesn't have a __len__
attribute. The relevant code is in Pool.map_async
, as that is used by Pool.map
(and starmap
) to produce the result - which is also a list.
If you don't want to read all data into memory first, you should use Pool.imap
or Pool.imap_unordered
, which will produce an iterator that will yield the results as they come in.
Upvotes: 1