Daniel Rudy
Daniel Rudy

Reputation: 1439

Force unlock a mutex that was locked by a different thread

Consider the following test program:

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <strings.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>


pthread_mutex_t mutex;
pthread_mutexattr_t mattr;
pthread_t thread1;
pthread_t thread2;
pthread_t thread3;


void mutex_force_unlock(pthread_mutex_t *mutex, pthread_mutexattr_t *mattr)
  {
    int e;
    e = pthread_mutex_destroy(mutex);
    printf("mfu: %s\n", strerror(e));
    e = pthread_mutex_init(mutex, mattr);
    printf("mfu: %s\n", strerror(e));
  }

void *thread(void *d)
  {
    int e;

    e = pthread_mutex_trylock(&mutex);
    if (e != 0)
      {
        printf("thr: %s\n", strerror(e));
        mutex_force_unlock(&mutex, &mattr);
        e = pthread_mutex_unlock(&mutex);
        printf("thr: %s\n", strerror(e));
        if (e != 0) pthread_exit(NULL);
        e = pthread_mutex_lock(&mutex);
        printf("thr: %s\n", strerror(e));
      }
    pthread_exit(NULL);
  }


void * thread_deadtest(void *d)
  {
    int e;
    e = pthread_mutex_lock(&mutex);
    printf("thr2: %s\n", strerror(e));
    e = pthread_mutex_lock(&mutex);
    printf("thr2: %s\n", strerror(e));
    pthread_exit(NULL);
  }


int main(void)
  {
    /* Setup */
    pthread_mutexattr_init(&mattr);
    pthread_mutexattr_settype(&mattr, PTHREAD_MUTEX_ERRORCHECK);
    //pthread_mutexattr_settype(&mattr, PTHREAD_MUTEX_NORMAL);
    pthread_mutex_init(&mutex, &mattr);

    /* Test */
    pthread_create(&thread1, NULL, &thread, NULL);
    pthread_join(thread1, NULL);
    if (pthread_kill(thread1, 0) != 0) printf("Thread 1 has died.\n");
    pthread_create(&thread2, NULL, &thread, NULL);
    pthread_join(thread2, NULL);
    pthread_create(&thread3, NULL, &thread_deadtest, NULL);
    pthread_join(thread3, NULL);
    return(0);
  }

Now when this program runs, I get the following output:

Thread 1 has died.
thr: Device busy
mfu: Device busy
mfu: No error: 0
thr: Operation not permitted
thr2: No error: 0
thr2: Resource deadlock avoided

Now I know this has been asked a number of times before, but is there any way to forcefully unlock a mutex? It seems the implementation will only allow the mutex to be unlocked by the thread that locked it as it seems to actively check, even with a normal mutex type.

Why am I doing this? It has to do with coding a bullet-proof network server that has the ability to recover from most errors, including ones where the thread terminates unexpectedly. At this point, I can see no way of unlocking a mutex from a thread that is different than the one that locked it. So the way that I see it is that I have a few options:

  1. Abandon the mutex and create a new one. This is the undesirable option as it creates a memory leak.
  2. Close all network ports and restart the server.
  3. Go into the kernel internals and release the mutex there bypassing the error checking.

I have asked this before but, the powers that be absolutely want this functionality and they will not take no for an answer (I've already tried), so I'm kinda stuck with this. I didn't design it this way, and I would really like to shoot the person who did, but that's not an option either.

And before someone says anything, my usage of pthread_kill is legal under POSIX...I checked.

I forgot to mention, this is FreeBSD 9.3 that we are working with.

Upvotes: 6

Views: 5632

Answers (5)

Daniel Rudy
Daniel Rudy

Reputation: 1439

I have come up with a workable method to deal with this situation. As I mentioned before, FreeBSD does not support robust mutexes so that option is out. Also one a thread has locked a mutex, it cannot be unlocked by any means.

So what I have done to solve the problem is to abandon the mutex and place its pointer onto a list. Since the lock wrapper code uses pthread_mutex_trylock and then relinquishes the CPU if it fails, no thread can get stuck on waiting for a permanently locked mutex. In the case of a robust mutex, the thread locking the mutex will be able recover it if it gets EOWNERDEAD as the return code.

Here's some things that are defined:

/* Checks to see if we have access to robust mutexes. */
#ifndef PTHREAD_MUTEX_ROBUST
#define TSRA__ALTERNATE
#define TSRA_MAX_MUTEXABANDON   TSRA_MAX_MUTEX * 4
#endif

/* Mutex: Mutex Data Table Datatype */
typedef struct mutex_lock_table_tag__ mutexlock_t;
struct mutex_lock_table_tag__
  {
    pthread_mutex_t *mutex;     /* PThread Mutex */
    tsra_daclbk audcallbk;      /* Audit Callback Function Pointer */
    tsra_daclbk reicallbk;      /* Reinit Callback Function Pointer */
    int acbkstat;               /* Audit Callback Status */
    int rcbkstat;               /* Reinit Callback Status */
    pthread_t owner;            /* Owner TID */
    #ifdef TSRA__OVERRIDE
    tsra_clnup_t *cleanup;      /* PThread Cleanup */
    #endif
  };

/* ******** ******** Global Variables */

pthread_rwlock_t tab_lock;              /* RW lock for mutex table */
pthread_mutexattr_t mtx_attrib;         /* Mutex attributes */
mutexlock_t *mutex_table;               /* Mutex Table */
int tabsizeentry;                       /* Table Size (Entries) */
int tabsizebyte;                        /* Table Size (Bytes) */
int initialized = 0;                    /* Modules Initialized 0=no, 1=yes */
#ifdef TSRA__ALTERNATE
pthread_mutex_t *mutex_abandon[TSRA_MAX_MUTEXABANDON];
pthread_mutex_t mtx_abandon;            /* Abandoned Mutex Lock */
int mtx_abandon_count;                  /* Abandoned Mutex Count */
int mtx_abandon_init = 0;               /* Initialization Flag */
#endif
pthread_mutex_t mtx_recover;            /* Mutex Recovery Lock */

And here's some code for the lock recovery:

/* Attempts to recover a broken mutex. */
int tsra_mutex_recover(int lockid, pthread_t tid)
  {
    int result;

    /* Check Prerequisites */
    if (initialized == 0) return(EDOOFUS);
    if (lockid < 0 || lockid >= tabsizeentry) return(EINVAL);

    /* Check Mutex Owner */
    result = pthread_equal(tid, mutex_table[lockid].owner);
    if (result != 0) return(0);

    /* Lock Recovery Mutex */
    result = pthread_mutex_lock(&mtx_recover);
    if (result != 0) return(result);

    /* Check Mutex Owner, Again */
    result = pthread_equal(tid, mutex_table[lockid].owner);
    if (result != 0)
      {
        pthread_mutex_unlock(&mtx_recover);
        return(0);
      }

    /* Unless the system supports robust mutexes, there is
       really no way to recover a mutex that is being held
       by a thread that has terminated.  At least in FreeBSD,
       trying to destory a mutex that is held will result
       in EBUSY.  Trying to overwrite a held mutex results
       in a memory fault and core dump.  The only way to
       recover is to abandon the mutex and create a new one. */
    #ifdef TSRA__ALTERNATE      /* Abandon Mutex */
    pthread_mutex_t *ptr;

    /* Too many abandoned mutexes? */
    if (mtx_abandon_count >= TSRA_MAX_MUTEXABANDON)
      {
        result = TSRA_PROGRAM_ABORT;
        goto error_1;
      }

    /* Get a read lock on the mutex table. */
    result = pthread_rwlock_rdlock(&tab_lock);
    if (result != 0) goto error_1;

    /* Perform associated data audit. */
    if (mutex_table[lockid].acbkstat != 0)
      {
        result = mutex_table[lockid].audcallbk();
        if (result != 0)
          {
            result = TSRA_PROGRAM_ABORT;
            goto error_2;
          }
      }

    /* Allocate New Mutex */
    ptr = malloc(sizeof(pthread_mutex_t));
    if (ptr == NULL)
      {
        result = errno;
        goto error_2;
      }

    /* Init new mutex and abandon the old one. */
    result = pthread_mutex_init(ptr, &mtx_attrib);
    if (result != 0) goto error_3;
    mutex_abandon[mtx_abandon_count] = mutex_table[lockid].mutex;
    mutex_abandon[mtx_abandon_count] = mutex_table[lockid].mutex;
    mtx_abandon_count++;
    mutex_table[lockid].mutex = ptr;

    #else       /* Recover Mutex */

    /* Try locking the mutex and see what we get. */
    result = pthread_mutex_trylock(mutex_table[lockid].mutex);
    switch (result)
      {
        case 0:                 /* No error, unlock and return */
          pthread_unlock_mutex(mutex_table[lockid].mutex);
          return(0);
          break;
        case EBUSY:             /* No error, return */
          return(0);
          break;
        case EOWNERDEAD:        /* Error, try to recover mutex. */
          if (mutex_table[lockid].acbkstat != 0)
              {
                result = mutex_table[lockid].audcallbk();
                if (result != 0)
                  {
                    if (mutex_table[lockid].rcbkstat != 0)
                        {
                          result = mutex_table[lockid].reicallbk();
                          if (result != 0)
                            {
                              result = TSRA_PROGRAM_ABORT;
                              goto error_2;
                            }
                        }
                      else
                        {
                          result = TSRA_PROGRAM_ABORT;
                          goto error_2;
                        }
                  }
              }
            else
              {
                result = TSRA_PROGRAM_ABORT;
                goto error_2;
              }
          break;
        case EDEADLK:           /* Error, deadlock avoided, abort */
        case ENOTRECOVERABLE:   /* Error, recovery failed, abort */
          /* NOTE: We shouldn't get this, but if we do... */
          abort();
          break;
        default:
          /* Ambiguous situation, best to abort. */
          abort();
          break;
      }
    pthread_mutex_consistant(mutex_table[lockid].mutex);
    pthread_mutex_unlock(mutex_table[lockid].mutex);
    #endif

    /* Housekeeping */
    mutex_table[lockid].owner = pthread_self();
    pthread_mutex_unlock(&mtx_recover);

    /* Return */
    return(0);

    /* We only get here on errors. */
    #ifdef TSRA__ALTERNATE
    error_3:
    free(ptr);
    error_2:
    pthread_rwlock_unlock(&tab_lock);
    #else
    error_2:
    pthread_mutex_unlock(mutex_table[lockid].mutex);
    #endif
    error_1:
    pthread_mutex_unlock(&mtx_recover);
    return(result);
  }

Because FreeBSD is an evolving operating system like Linux is, I have made provisions to allow for the use of robust mutexes in the future. Since without robust mutexes, there really is no way to do enhanced error checking which is available if robust mutexes are supported.

For a robust mutex, enhanced error checking is performed to verify the need to recover the mutex. For systems that do not support robust mutexes, we have to trust the caller to verify that the mutex in question needs to be recovered. Besides, there is some checking to make sure that there is only one thread performing the recovery. All other threads blocking on the mutex are blocked. I have given some thought about how to signal other threads that a recovery is in progress, so that aspect of the routine still needs work. In a recovery situation, I'm thinking about comparing pointer values to see if the mutex was replaced.

In both cases, an audit routine can be set as a callback function. The purpose of the audit routine is to verify and correct any data discrepancies in the protected data. If the audit fails to correct the data, then another callback routine, the data reinitialize routine, is invoked. The purpose of this is to reinitialize the data that is protected by the mutex. If that fail, then abort() is called to terminate program execution and drop a core file for debugging purposes.

For the abandoned mutex case, the pointer is not thrown away, but is placed on a list. If too many mutexes are abandoned, then the program is aborted. As mentioned above, in the mutex lock routine, pthread_mutex_trylock is used instead of pthread_mutex_lock. This way, no thread can be permanently blocked on a dead mutex. So once the pointer is switched in the mutex table to point to the new mutex, all threads waiting on the mutex will immediately switch to the new mutex.

I am sure there are bugs/errors in this code, but this is a work in progress. Although not quite finished and debugged, I feel that there is enough here to warrant an answer to this question.

Upvotes: 1

anakin
anakin

Reputation: 589

You could restart just the process with the crashed thread using function from the exec family to change the process image. I assume that it will be faster to reload the process than to reboot the sever.

Upvotes: 0

Andrew Henle
Andrew Henle

Reputation: 1

Use a robust mutex, and if the locking thread dies, fix the mutex with pthread_mutex_consistent().

If mutex is a robust mutex in an inconsistent state, the pthread_mutex_consistent() function can be used to mark the state protected by the mutex referenced by mutex as consistent again.

If an owner of a robust mutex terminates while holding the mutex, the mutex becomes inconsistent and the next thread that acquires the mutex lock shall be notified of the state by the return value [EOWNERDEAD]. In this case, the mutex does not become normally usable again until the state is marked consistent.

If the thread which acquired the mutex lock with the return value [EOWNERDEAD] terminates before calling either pthread_mutex_consistent() or pthread_mutex_unlock(), the next thread that acquires the mutex lock shall be notified about the state of the mutex by the return value [EOWNERDEAD].

Upvotes: 7

nos
nos

Reputation: 229058

Well, you cannot do what you ask wit a normal pthread mutex, since, as you say, you can only unlock a mutex from the thread that locked it.

What you can do is wrap locking/unlocking of a mutex such that you have a pthread cancel handler that unlocks the mutex if the thread terminates. To give you an idea:

void cancel_unlock_handler(void *p)
{
    pthread_mutex_unlock(p);
}

int my_pthread_mutex_lock(pthread_mutex_t *m)
{
    int rc;
    pthread_cleanup_push(cancel_unlock_handler, m);
    rc = pthread_mutex_lock(&m);
    if (rc != 0) {
        pthread_cleanup_pop(0);   
    }
    return rc;
}       

int my_pthread_mutex_unlock(pthread_mutex_t *m)
{
    pthread_cleanup_pop(0);
    return pthread_mutex_unlock(&m);
}

Now you'll need to use the my_pthread_mutex_lock/my_pthread_mutex_unlock instead of the pthread lock/unlock functions.

Now, threads don't really terminate "unexpectedly", either it calls pthread_exit or it ends, or you pthread_kill it, in which case the above will suffice (also note that threads exit only at certain cancellation points, so there's no race conditions e.g.between pushing the cleanup handler and locking the mutex) , but logical error or undefined behavior might leave erroneous state affecting the whole process, and you're better off re-starting the whole process.

Upvotes: 1

kspviswa
kspviswa

Reputation: 657

Well as you probably aware, a thread which locks a mutex, has the sole ownership of that resource. So it has got all the rights to unlock it. There is no way, atleast till now, to force a thread, give up its resource, without having to do a round about way, that you had did in your code.

However, this would be my approach.

Have a single thread, that owns a mutex, called as Resource thread. Make sure that, this thread receives & responds events to other worker thread.

When a worker thread, wanna enter into critical section, it registers with Resource thread to lock a mutex on it's behalf. When done, the worker thread assumes that, it has got exclusive access to critical section. The assumption is valid because, any other worker thread, which needs to get access to critical section, has to go through the same step.

Now assume that, there is another thread, who wants to force the former worker thread, to unlock, then he can make a special call, maybe a flag or with high priority thread to grant access. The resource thread, on comparing the flag / priority of the requesting thread, will unlock the mutex and lock again for the requesting thread.

I don't know for sure your use-case fully, but just my 2 cents. If you like it, don't forget vote my answer.

Upvotes: 0

Related Questions