MrLatinNerd
MrLatinNerd

Reputation: 85

Using multiprocessing causes MemoryError

I have some code that is supposed to take a large list of data (stored in a variable called stabiliser_states) and do some analysis on every subset of size set_size. Since the number of subsets can get very large, I've been trying to use the multiprocessing module to parallelise the code. However, for the data that I'm actually interested in, running the code eventually leads to a MemoryError.

Here's the part of the code that has been parallelised. The function check_lin_dep does the data analysis and seems to be working fine.

import multiprocessing as mp
from itertools import combinations

...

pool = mp.Pool()
lin_dep_iter = pool.starmap(
    check_lin_dep,
    ((comb[1], n, set_size, comb[0]) for comb in \
         enumerate(combinations(stabiliser_states, set_size))),
    chunksize=1000
)
pool.close()

For the data which leads to the MemoryError, the length of stabiliser_states is about 37000, so for set_size = 2 there are about 700 million combinations. My understanding is that starmap should be using an iterable instead of a list, but I suppose that the error maybe suggests that the program is trying to store the combinations (or another huge list) in memory somewhere?

Upvotes: 1

Views: 788

Answers (2)

MrLatinNerd
MrLatinNerd

Reputation: 85

For my purposes, I discovered that the real problem is that starmap converts the args generator to a list behind the scenes (How to use a generator as an iterable with Multiprocessing map function). This is what was causing the MemoryError. I was able to solve this by simply using imap instead of starmap.

Upvotes: 1

tomer
tomer

Reputation: 154

It seems it already been answered here: Python Multiprocessing.Pool lazy iteration

Upvotes: 1

Related Questions