Reputation: 2238
I'm trying to parallelize a for loop in python. I get a memory error in the permutations phase of the parameters.
I can see why this might fail, but what is the other way to do the same thing.
import itertools
import multiprocessing
LOAD_GEN_KEYS = range(138259)
ES_DATA_K = range(9606834)
paramlist = list(itertools.product(LOAD_GEN_KEYS, ES_DATA_K))
pool = multiprocessing.Pool()
VALID_TS = pool.map(curate_results, paramlist)
def curate_results(params):
LG = params[0]
ES = params[1]
ES_S = str(int(ES)/1000)
if ES_S == LG:
return [LG, ES]
else:
return []
Any help will be appreciated.
Upvotes: 0
Views: 213
Reputation: 59566
You are creating a list of 138259 * 9606834 items. That is too much for your memory.
I propose to go with the generator which itertools.product()
actually is (i. e. the elements aren't stored in memory but created on the fly during iterating):
paramlist = itertools.product(LOAD_GEN_KEYS, ES_DATA_K)
Since the pool.map(curate_results, paramlist)
accepts an iterable, a generator should do as fine as the list.
Upvotes: 2