Venkatavaradhan
Venkatavaradhan

Reputation: 139

pthread_recursive_mutex - assertion failed

I'm using ROS (Robot operating system) framework. If you are familiar with ROS, in my code, I'm not using activity servers. Plainly using publishers, subscribers and services. Unfortunately, I'm facing issue with pthread_recursive_mutex error. The following is the error and its backtrace.

If anyone is familiar with ROS stack, could you please share what could be potential causes that might cause this runtime error ?

I can give more information about my the runtime error. Help much appreciated. Thanks

/usr/include/boost/thread/pthread/recursive_mutex.hpp:113: void boost::recursive_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.

enter image description here

Upvotes: 0

Views: 1675

Answers (2)

Paralogos
Paralogos

Reputation: 51

This looks like a use-after-free problem, where a mutex has already been destroyed, probably because its owning object was deleted.

I had some success using Valgrind to hunt down this type of bugs. Install it using apt install valgrind, and add a launch-prefix="valgrind" to the <node> in your launch file. It will be super slow, but it's quite adept at pinpointing these issues.

Take this buggy program for example:

struct Test
{
    int a;
};

int main()
{
    Test* test = new Test();
    test->a = 42;
    delete test;
    test->a = 0; // BUG!
}

valgrind ./testprog yields

==8348== Invalid write of size 4
==8348==    at 0x108601: main (test.cpp:11)
==8348==  Address 0x5b7ec80 is 0 bytes inside a block of size 4 free'd
==8348==    at 0x4C3168B: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8348==    by 0x108600: main (test.cpp:10)
==8348==  Block was alloc'd at
==8348==    at 0x4C303EF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8348==    by 0x1085EA: main (test.cpp:8)

Note how it will not only tell you where the buggy access happened (test.cpp:11), but also where the Test object was deleted (test.cpp:10), and where it was initially created (test.cpp:8).

Good luck in your bug hunt!

Upvotes: 1

sehe
sehe

Reputation: 393944

The lock method implementation merely assert the pthread return value:

    void lock()
    {
        BOOST_VERIFY(!posix::pthread_mutex_lock(&m));
    }

This means that according to the docs, either:

  • (EAGAIN) The mutex could not be acquired because the maximum number of recursive locks for mutex has been exceeded.

    This would indicate you have some kind of imbalance in your locks (not this call-site, because unique_lock<> makes sure that doesn't happen) or are just racking up threads that are all waiting for the same lock

  • (EOWNERDEAD) The mutex is a robust mutex and the process containing the previous owning thread terminated while holding the mutex lock. The mutex lock shall be acquired by the calling thread and it is up to the new owner to make the state consistent.

    Boost does not deal with this case and simply asserts. This would also not likely occur if all your threads use thread-safe lock-guards (scoped_lock, unique_lock, shared_lock, lock_guard). It could, however, occur, if you use the lock() (and unlock()) functions manually somewhere and the thread exits without unlock()ing

There are some other ways in which (particularly checked) mutexes can fail, but those would not apply to boost::recursive_mutex

Upvotes: 1

Related Questions