AlasdairC
AlasdairC

Reputation: 320

Mutual exclusion isn't exclusive

I have the following code which runs in 2 threads started by an init call from the main thread. One for writing to a device, one for reading. My app is called by other threads to add items to the queues. pop_queue handles all locking, as does push_queue. Whenever I modify a req r, I lock it's mutex. q->process is a function pointer to one of either write_sector, read_setor. I need to guard against simultaneous calls to the two function pointers, so I'm using a mutex on the actual process call, however this is not working.

According to the text program, I am making parallel calls to the process functions. How is that possible given I lock immediatly before and unlock immediately afterwards?

The following error from valgrind --tool=helgrind might help?

==3850== Possible data race during read of size 4 at 0xbea57efc by thread #2
==3850==    at 0x804A290: request_handler (diskdriver.c:239)

Line 239 is r->state = q->process(*device, &r->sd) +1

void *
request_handler(void *arg)
{
    req *r;
    queue *q = arg;
    int writing = !strcmp(q->name, "write");
    for(;;) {
        /*
         * wait for a request
         */
        pop_queue(q, &r, TRUE);

        /*
         * handle request
         * req r is unattached to any lists, but must lock it's properties incase being redeemed
         */
        printf("Info: driver: (%s) handling req %d\n", q->name, r->id);
        pthread_mutex_lock(&r->lock);

        pthread_mutex_lock(&q->processing);
        r->state = q->process(*device, &r->sd) +1;
        pthread_mutex_unlock(&q->processing);

        /* 
         * if writing, return the SectorDescriptor
         */
         if (writing) {
            printf("Info: driver (write thread) has released a sector descriptor.\n");
            blocking_put_sd(*sd_store, r->sd);
            r->sd = NULL;
        }

        pthread_mutex_unlock(&r->lock);
        pthread_cond_signal(&r->changed);

    }
}

EDIT

Here is the one other location where the req's properties are read

int redeem_voucher(Voucher v, SectorDescriptor *sd)
{
    int result;

    if (v == NULL){
        printf("Driver: null voucher redeemed!\n");
        return 0;
    }
    req *r = v;
    pthread_mutex_lock(&r->lock);

    /* if state = 0 job still running/queued */
    while(r->state==0) {
        printf("Driver: blocking for req %d to finish\n", r->id);
        pthread_cond_wait(&r->changed, &r->lock);
    }

    sd = &r->sd;
    result = r->state-1;
    r->sd = NULL;
    r->state = WAIT;
    //printf("Driver: req %d completed\n", r->id);
    pthread_mutex_unlock(&r->lock);
    /*
     * return req to pool
     */    
    push_queue(&pool_q, r);
    return result;
}

EDIT 2 here's the push_ and pop_queue functions

int
pop_queue(struct queue *q, req **r, int block)
{
    pthread_mutex_lock(&q->lock);
    while(q->head == NULL) {
        if(block) {
            pthread_cond_wait(&q->wait, &q->lock);
        }
        else {
            pthread_mutex_unlock(&q->lock);
            return FALSE;
        }
    }

    req *got = q->head;
    q->head = got->next;
    got->next = NULL;
    if(!q->head) {
        /* just removed last element */
        q->tail = q->head;
    }

    *r = got;
    pthread_mutex_unlock(&q->lock);
    return TRUE;
}

/*
 * perform a standard linked list insertion to the queue specified
 * handles all required locking and signals any listeners
 * return: int - if insertion was successful
 */
int
push_queue(queue *q, req *r)
{
    /*
     * push never blocks, 
     */
    if(!r || !q)
        return FALSE;

    pthread_mutex_lock(&q->lock);

    if(q->tail) {
        q->tail->next = r;
        q->tail = r;
    }
    else {
        /* was an empty queue */
        q->tail = q->head = r;
    }

    pthread_mutex_unlock(&q->lock);
    pthread_cond_signal(&q->wait);

    return TRUE;
}

Upvotes: 0

Views: 283

Answers (2)

Jens Gustedt
Jens Gustedt

Reputation: 78923

Your line

pthread_cond_signal(&r->changed);

let me suspect that you have other code that is also manipulating the structure pointed to by r. In any case it makes not much sense if you have nobody waiting for that condition variable. (And you should invert the unlock and signal lines.)

So, probably your error is just somewhere else, where you access r simultaneaously without taking the lock on the mutex. You didn't show us the rest of your code, so saying more would be even more guess work.

Upvotes: 0

Mark Wilkins
Mark Wilkins

Reputation: 41222

Based on the available information, it seems that a likely possibility then is that another thread is modifying the data pointed to by *device. Perhaps it is being modified while the q->processing mutex is not held.

Upvotes: 0

Related Questions