KBorja
KBorja

Reputation: 257

Boost Python, propagate C++ callbacks to Python causing segmentation fault

I have the following listener in C++ that receives a Python object to propagate the callbacks.

class PyClient {
    private:
        std::vector<DipSubscription *> subs;

        subsFactory *sub;

        class GeneralDataListener: public SubscriptionListener {
            private:
                PyClient * client;

            public:
                GeneralDataListener(PyClient *c):client(c){
                    client->pyListener.attr("log_message")("Handler created");
                }

                void handleMessage(Subscription *sub, Data &message) {
                    // Lock the execution of this method
                    PyGILState_STATE state = PyGILState_Ensure();
                    client->pyListener.attr("log_message")("Data received for topic");
                    ...
                    // This method ends modifying the value of the Python object
                    topicEntity.attr("save_value")(valueKey, extractDipValue(valueKey.c_str(), message))
                    // Release the lock
                    PyGILState_Release(state);
                }

                void connected(Subscription *sub) {
                    client->pyListener.attr("connected")(sub->getTopicName());
                }

                void disconnected(Subscription *sub, char* reason) {
                    std::string s_reason(reason);
                    client->pyListener.attr("disconnected")(sub->getTopicName(), s_reason);
                }

                void handleException(Subscription *sub, Exception &ex) {
                    client->pyListener.attr("handle_exception")(sub->getTopicName())(ex.what());
                }
        };

        GeneralDataListener *handler;

    public:
        python::object pyListener;


        PyClient(python::object pyList): pyListener(pyList) {
            std::ostringstream iss;
            iss << "Listener" << getpid();
            sub = Sub::create(iss.str().c_str());
            createSubscriptions();
        }

        ~PyClient() {
            for (unsigned int i = 0; i < subs.size(); i++) {
                if (subs[i] == NULL) {
                    continue;
                }

                sub->destroySubscription(subs[i]);
            }
        }
};


BOOST_PYTHON_MODULE(pytest)
{
    // There is no need to expose more methods as will be used as callbacks
    Py_Initialize();
    PyEval_InitThreads();
    python::class_<PyClient>("PyClient",     python::init<python::object>())
        .def("pokeHandler", &PyClient::pokeHandler);
};

Then, I have my Python program, which is like this:

import sys
import time

import pytest


class Entity(object):
    def __init__(self, entity, mapping):
        self.entity = entity
        self.mapping = mapping
        self.values = {}
        for field in mapping:
            self.values[field] = ""

        self.updated = False

    def save_value(self, field, value):
        self.values[field] = value
        self.updated = True


class PyListener(object):
    def __init__(self):
        self.listeners = 0
        self.mapping = ["value"]

        self.path_entity = {}
        self.path_entity["path/to/node"] = Entity('Name', self.mapping)

    def connected(self, topic):
        print "%s topic connected" % topic

    def disconnected(self, topic, reason):
        print "%s topic disconnected, reason: %s" % (topic, reason)

    def handle_message(self, topic):
        print "Handling message from topic %s" % topic

    def handle_exception(self, topic, exception):
        print "Exception %s in topic %s" % (exception, topic)

    def log_message(self, message):
       print message

    def sample(self):
        for path, entity in self.path_entity.iteritems():
            if not entity.updated:
                return False

            sample = " ".join([entity.values[field] for field in dip_entity.mapping])
            print "%d %s %d %s" % (0, entity.entity, 4324, sample)
            entity.updated = False

        return True


if __name__ == "__main__":
    sys.settrace(trace)
    py_listener = PyListener()
    sub = pytest.PyClient(py_listener)

    while True:
        if py_listener.sample():
            break

So, finally, my problem seems to be that when I start running the while True in the Python program the script gets stuck checking if the entity is updated, and randomly, when the C++ listener tries to invoke the callback I get a segmentation fault.

The same if I just try time.sleep in the python script and call sample time by time. I know it will be solved if I call sample from the C++ code, but this script will be run by other Python module that will call the sample method given a specific delay.So the expected functioning will be for the C++ to update the value of the entities and the Python script to just read them.

I've debug the error with gdb, but the stack trace I'm getting is not much explanatory:

#0  0x00007ffff7a83717 in PyFrame_New () from /lib64/libpython2.7.so.1.0
#1  0x00007ffff7af58dc in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#2  0x00007ffff7af718d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#3  0x00007ffff7af7292 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#4  0x00007ffff7b106cf in run_mod () from /lib64/libpython2.7.so.1.0
#5  0x00007ffff7b1188e in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#6  0x00007ffff7b12b19 in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#7  0x00007ffff7b23b1f in Py_Main () from /lib64/libpython2.7.so.1.0
#8  0x00007ffff6d50af5 in __libc_start_main () from /lib64/libc.so.6
#9  0x0000000000400721 in _start ()

And if debug with sys.trace inside Python the last line before the segmentation fault is always in the sample method, but it may vary.

I'm not sure how can I solve this communication problems, so any advice in the right direction will be much appreciated.

Edit Modify the PyDipClient reference to PyClient.

What is happening is I start the program from the Python main method, if then the C++ listener tries to callback the Python listener it crashes with the segmentation fault error, the only thread I believe is created is when I create a subscription, but that is code from inside a library that I don't know how is working exactly.

If I remove all the callbacks to the Python listener, and force the methods from Python (like calling the pokehandler) everything is working perfectly.

Upvotes: 1

Views: 1872

Answers (1)

Tanner Sansbury
Tanner Sansbury

Reputation: 51881

The most likely culprit is that the Global Interpreter Lock (GIL) is not being held by a thread when it is invoking Python code, resulting in undefined behavior. Verify all paths that make Python calls, such as GeneralDataListener's functions, acquire the GIL before invoking Python code. If copies of PyClient are being made, then pyListener needs to be managed in a manner that allows the GIL to be held when it is copied and destroyed.

Furthermore, consider the rule of three for PyClient. Do the copy-constructor and assignment operator need to do anything with regards to the subscription?


The GIL is a mutex around the CPython interpreter. This mutex prevents parallel operations to be performed on Python objects. Thus, at any point in time, a max of one thread, the one that has acquired the GIL, is allowed to perform operations on Python objects. When multiple threads are present, invoking Python code whilst not holding the GIL results in undefined behavior.

C or C++ threads are sometimes referred to as alien threads in the Python documentation. The Python interpreter has no ability to control the alien thread. Therefore, alien threads are responsible for managing the GIL to permit concurrent or parallel execution with Python threads.

In the current code:

  • GeneralDataListener::handle_message() manages the GIL in a non-exception safe manner. For example, if the listener's log_message() method throws an exception, the stack will unwind and not release the GIL as PyGILState_Release() will not be invoked.

    void handleMessage(...)
    {
      PyGILState_STATE state = PyGILState_Ensure();
      client->pyListener.attr("log_message")(...);
      ...
    
      PyGILState_Release(state); // Not called if Python throws.
    }
    
  • GeneralDataListener::connected(), GeneralDataListener:: disconnected(), and GeneralDataListener:: handleException() are explicitly invoking Python code, but do not explicitly manage the GIL. If the caller does not own the GIL, then undefined behavior is invoked as Python code is being executed without the GIL.

    void connected(...)
    {
      // GIL not being explicitly managed.
      client->pyListener.attr("connected")(...);
    }
    
  • PyClient's implicitly created copy-constructor and assignment operator do not manage the GIL, but may indirectly invoke Python code when copying the pyListener data member. If copies are being made, then the caller needs to hold the GIL when the PyClient::pyListener object is being copied and destroyed. If the pyListener is not managed on the free space, then the caller must be Python aware and have acquired the GIL during the destruction of the entire PyClient object.

To resolve these, consider:

  • Using a Resource Acquisition Is Initialization (RAII) guard class to help manage the GIL in an exception safe manner. For example, with the following gil_lock class, when a gil_lock object is created, the calling thread will acquire the GIL. When the gil_lock object is destructed, it releases the GIL

    /// @brief RAII class used to lock and unlock the GIL.
    class gil_lock
    {
    public:
      gil_lock()  { state_ = PyGILState_Ensure(); }
      ~gil_lock() { PyGILState_Release(state_);   }
    private:
      PyGILState_STATE state_;
    };
    
    ...
    
    void handleMessage(...)
    {
      gil_lock lock;
      client->pyListener.attr("log_message")(...);
      ...
    }
    
  • Explicitly manage the GIL in any code path that is invokes Python code from within an alien thread.

    void connected(...)
    {
      gil_lock lock;
      client->pyListener.attr("connected")(...);
    }
    
  • Making PyClient non-copyable or explicitly creating the copy-constructor and assignment operator. If copies are being made, then change pyListener to be held by a type that allows for explicit destruction while the GIL is being held. One solution is to use a boost::shared_ptr<python::object> that manages a copy of the python::object provided to the PyClient during construction, and has a custom deleter that is GIL aware. Alternatively, one could use something like boost::optional.

    class PyClient
    {
    public:
    
      PyClient(const boost::python::object& object)
        : pyListener(
            new boost::python::object(object),  // GIL locked, so copy.
            [](boost::python::object* object)   // Delete needs GIL.
            {
              gil_lock lock;
              delete object;
            }
          )
      {
        ...
      }
    
    private:
      boost::shared_ptr<boost::python::object> pyListener;;
    };
    

    Note that by managing the boost::python::object on the free-space, one can freely copy the shared_ptr without holding the GIL. On the other hand, if one was using something like boost::optional to manage the Python object, then one would need to hold the GIL during copy-construction, assignment, and destruction.

Consider reading this answer for more details on callbacks into Python and subtle details, such as GIL management during copy-construction and destruction.

Upvotes: 2

Related Questions