MaxG
MaxG

Reputation: 5

Python's random.sample and stack overflow

I've been working on a large Python scientific project, and I'm encountering a Stack overflow issue, which basically involves random.sample() and multiprocessing. I have a Goo object which contains a large population of Foo, who seek to make friends. To do so, they pick p other Foo belonging to Goo randomly using random.sample(). Once they are done, the program stops.

It goes this way:

foo.py

class Foo(object):
    def __init__(self):
        self.friends = []

goo.py:

from foo import Foo
import random

class Goo(object):
    def __init__(self, nfoo):
        self.foo_list = [Foo() for i in range(nfoo)]

    def sim_goo(self):
        for f in self.foo_list:
            candidates = random.sample(self.foo_list, 5)
            f.friends = candidates

and main.py:

from goo import Goo

def do_sim(argument):
    g = Goo(argument)
    g.sim_goo()
    return g

and using Jupyter I run:

from main import do_sim
from multiprocessing import Pool
pool = Pool(processes = 2)
para_list = [1000, 1000]
result = pool.map_async(sim_goo, para_list).get()

which raises a MaybeEncodingError: Error sending result: '[<goo.Goo object at 0x0000003B15E60A90>]'. Reason: 'RecursionError('maximum recursion depth exceeded while pickling an object',)'

As it works with para_list = [10, 10], I can only imagine that the error is raised because random.sample() gets too big to handle when the list it picks from is too large, which becomes problematic when using multiprocessing. But 1000 Foos isn't much.

Does anybody know an alternative?

Thanks for your time!

Best,

Upvotes: 0

Views: 116

Answers (1)

jasonharper
jasonharper

Reputation: 9587

In order for your Goo object to be pickled (to transport it to another process), all of its Foos must first be pickled. To pickle each of those Foos, all of their friend Foos must first be pickled. And to pickle those Foos... and so on. It's quite likely that there's going to be a chain of friendships that runs through all 1000 Foos (and therefore requires a stack depth during the pickling process somewhat over 1000) - but the default recursion limit (on my copy of Python, at least) is only 1000.

If you can live with 1000 Foos as your limit, you can probably bump up the recursion limit a bit - sys.setrecursionlimit(1050), perhaps. If you're going to ever need substantially more, a different approach is needed. The first thing that comes to my mind is to store each Foo's friendships as a list of indices into the Goo's foo_list, rather than actual references to other Foos.

Upvotes: 2

Related Questions