J. Doe
J. Doe

Reputation: 43

python TypeError: can't pickle generator objects on using generators in combination with multiprocessing

I have a function say fun() which yields generators. I want to check whether the generator is empty, and since i want to save as much run time as possible, I don't convert it to a list and check whether its empty. Instead I do this:

def peek(iterable):
    try:
        first = next(iterable)
    except StopIteration:
        return None
    return first, itertools.chain([first], iterable)

I use multiprocessing like this:

def call_generator_obj(ret_val):
    next = peek(ret_val)
    if next is not None:
        return False
    else:
        return True


def main():
    import multiprocessing as mp
    pool = mp.Pool(processes=mp.cpu_count()-1)
    # for loop over here
        ret_val = fun(args, kwargs)
        results.append(pool.apply(call_generator_obj, args=(ret_val,))
        # the above line throws me the error

As far as I know pickling is when converting some object in memory to a byte stream and I am doing anything like that in any of my functions.

TRACEBACK: (after the pointed line)

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 253, in apply
    return self.apply_async(func, args, kwds).get()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
    put(task)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: can't pickle generator objects

Upvotes: 2

Views: 11690

Answers (1)

abarnert
abarnert

Reputation: 365707

As far as I know pickling is when converting some object in memory to a byte stream and I don't think I am doing anything like that over here.

Well, you are doing that.

You can't pass Python values directly between processes. Even the simplest variable holds a pointer to a structure somewhere in the process's memory space, and just copying that pointer to a different process would give you a segfault or garbage, depending on whether the same memory space in the other process is unmapped or mapped to something completely different. Something as complex as a generator—which is basically a live stack frame—would be even more impossible.

The way multiprocessing solves that is to transparently pickle everything you give it to pass. Functions, and their arguments, and their return values, all need to be pickled.

If you want to know how it work under the covers: a Pool essentially works by having a Queue that the parent puts tasks on—where the tasks are basically (func, args) pairs—and the children get tasks off. And a Queue essentially works by calling pickle.dumps(value) and then writing the result to a pipe or other inter-process-communication mechanism.

Upvotes: 2

Related Questions