Simd
Simd

Reputation: 21343

How to create a python module in C++ that multiprocessing does not support

I am trying and failing to reproduce and understand a problem I saw where multiprocessing failed when using a python module written in C++. My understanding was that the problem is that multiprocessing needs to pickle the function it is using. So I made my_module.cpp as follows:

#include <pybind11/pybind11.h>

int add(int input_number) {
    return input_number + 10;
}

PYBIND11_MODULE(my_module, m) {
    m.doc() = "A simple module implemented in C++ to add 10 to a number.";
    m.def("add", &add, "Add 10 to a number");
}

After

pip install pybind11

I compiled with:

c++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) my_module.cpp -o my_module$(python3-config --extension-suffix)

I can import my_module and it works as expected.

I can test if it can be pickled with:

import my_module
import pickle

# Use the add function
print(my_module.add(5))  # Outputs: 15

# Attempt to pickle the module
try:
    pickle.dumps(my_module)
except TypeError as e:
    print(f"Pickling error: {e}")  # Expected error

which outputs Pickling error: cannot pickle 'module' object as expected.

Now I tested multiprocessing and was surprising that it worked. I was expecting it to give a pickling error.

import my_module
from multiprocessing import Pool

# A wrapper function to call the C++ add function
def parallel_add(number):
    return my_module.add(number)

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    try:
        # Create a pool of worker processes
        with Pool(processes=2) as pool:
            results = pool.map(parallel_add, numbers)
        print(results)  # If successful, prints the results
    except Exception as e:
        print(f"Multiprocessing error: {e}")

How can I make a Python module in C++ with pybind11 which fails with multiprocessing because of a pickling error?

I am using Linux

Upvotes: 1

Views: 117

Answers (1)

Anerdw
Anerdw

Reputation: 1987

I don't think your code tries to pickle a module as-is? If you redefine parallel_add to take a module as an argument, then use a partial to pass my_module into it, you can force Python to do that.

import my_module
from functools import partial
from multiprocessing import Pool

# Same wrapper, but now takes a module as an argument
def parallel_add(module, number):
    return module.add(number)

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    with Pool(processes=2) as pool:
        results = pool.map(partial(parallel_add, my_module), numbers)
    print(results)

This throws the error you were expecting:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/anerdw/stackoverflow/unpickling.py", line 13, in <module>
    results = pool.map(partial(parallel_add, my_module), numbers)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'module' object

You can also get a pickling-related multiprocessing error much more quickly by cutting out the wrapper and trying to pickle the function directly.

import my_module
from multiprocessing import Pool

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    with Pool(processes=2) as pool:
        results = pool.map(my_module.add, numbers)
    print(results)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/anerdw/stackoverflow/unpickling.py", line 8, in <module>
    results = pool.map(my_module.add, numbers)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'PyCapsule' object

Upvotes: 3

Related Questions