tandem
tandem

Reputation: 2238

memory error with itertools when parallelizing for loop

I'm trying to parallelize a for loop in python. I get a memory error in the permutations phase of the parameters.

I can see why this might fail, but what is the other way to do the same thing.

import itertools
import multiprocessing
LOAD_GEN_KEYS = range(138259)
ES_DATA_K = range(9606834)
paramlist      = list(itertools.product(LOAD_GEN_KEYS, ES_DATA_K))
pool           = multiprocessing.Pool()
VALID_TS       = pool.map(curate_results, paramlist)


def curate_results(params):
    LG = params[0]
    ES = params[1]
    ES_S = str(int(ES)/1000)
    if ES_S == LG:
        return [LG, ES]
    else:
        return []

Any help will be appreciated.

Upvotes: 0

Views: 213

Answers (1)

Alfe
Alfe

Reputation: 59566

You are creating a list of 138259 * 9606834 items. That is too much for your memory.

I propose to go with the generator which itertools.product() actually is (i. e. the elements aren't stored in memory but created on the fly during iterating):

paramlist      = itertools.product(LOAD_GEN_KEYS, ES_DATA_K)

Since the pool.map(curate_results, paramlist) accepts an iterable, a generator should do as fine as the list.

Upvotes: 2

Related Questions