Reputation: 43
I have a function say fun()
which yields
generators. I want to check whether the generator is empty, and since i want to save as much run time as possible, I
don't convert it to a list and check whether its empty.
Instead I do this:
def peek(iterable):
try:
first = next(iterable)
except StopIteration:
return None
return first, itertools.chain([first], iterable)
I use multiprocessing like this:
def call_generator_obj(ret_val):
next = peek(ret_val)
if next is not None:
return False
else:
return True
def main():
import multiprocessing as mp
pool = mp.Pool(processes=mp.cpu_count()-1)
# for loop over here
ret_val = fun(args, kwargs)
results.append(pool.apply(call_generator_obj, args=(ret_val,))
# the above line throws me the error
As far as I know pickling is when converting some object in memory to a byte stream and I am doing anything like that in any of my functions.
TRACEBACK: (after the pointed line)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 253, in apply
return self.apply_async(func, args, kwds).get()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle generator objects
Upvotes: 2
Views: 11690
Reputation: 365707
As far as I know pickling is when converting some object in memory to a byte stream and I don't think I am doing anything like that over here.
Well, you are doing that.
You can't pass Python values directly between processes. Even the simplest variable holds a pointer to a structure somewhere in the process's memory space, and just copying that pointer to a different process would give you a segfault or garbage, depending on whether the same memory space in the other process is unmapped or mapped to something completely different. Something as complex as a generator—which is basically a live stack frame—would be even more impossible.
The way multiprocessing
solves that is to transparently pickle everything you give it to pass. Functions, and their arguments, and their return values, all need to be pickled.
If you want to know how it work under the covers: a Pool
essentially works by having a Queue
that the parent put
s tasks on—where the tasks are basically (func, args)
pairs—and the children get
tasks off. And a Queue
essentially works by calling pickle.dumps(value)
and then writing the result to a pipe or other inter-process-communication mechanism.
Upvotes: 2