Recursive function using pthreads in C

Question

I have the following piece of code

    #include "stdio.h"
#include "stdlib.h"
#include 

#define MAXBINS 8


void swap_long(unsigned long int **x, unsigned long int **y){

  unsigned long int *tmp;
  tmp = x[0];
  x[0] = y[0];
  y[0] = tmp;

}

void swap(unsigned int **x, unsigned int **y){

  unsigned int *tmp;
  tmp = x[0];
  x[0] = y[0];
  y[0] = tmp;

}

void truncated_radix_sort(unsigned long int *morton_codes, 
              unsigned long int *sorted_morton_codes, 
              unsigned int *permutation_vector,
              unsigned int *index,
              int *level_record,
              int N, 
              int population_threshold,
              int sft, int lv){

  int BinSizes[MAXBINS] = {0};
  unsigned int *tmp_ptr;
  unsigned long int *tmp_code;

  level_record[0] = lv; // record the level of the node

  if(N<=population_threshold || sft < 0) { // Base case. The node is a leaf
    memcpy(permutation_vector, index, N*sizeof(unsigned int)); // Copy the pernutation vector
    memcpy(sorted_morton_codes, morton_codes, N*sizeof(unsigned long int)); // Copy the Morton codes 

    return;
  }
  else{

    // Find which child each point belongs to 
    int j = 0;
    for(j=0; j>sft) & 0x07;
      BinSizes[ii]++;
    }


    // scan prefix 
    int offset = 0, i = 0;
    for(i=0; i>sft) & 0x07;
      permutation_vector[BinSizes[ii]] = index[j];
      sorted_morton_codes[BinSizes[ii]] = morton_codes[j];
      BinSizes[ii]++;
    }

    //swap the index pointers  
    swap(&index, &permutation_vector);

    //swap the code pointers 
    swap_long(&morton_codes, &sorted_morton_codes);

    /* Call the function recursively to split the lower levels */
    offset = 0; 
    for(i=0; i



I tried to make this block

int j = 0;
    for(j=0; j>sft) & 0x07;
      BinSizes[ii]++;
    }


parallel by substituting it with the following

    int rc,j;
    pthread_t *thread = (pthread_t *)malloc(NTHREADS*sizeof(pthread_t));
    belong *belongs = (belong *)malloc(NTHREADS*sizeof(belong));
    pthread_mutex_init(&bin_mtx, NULL);
    for (j = 0; j < NTHREADS; j++){
        belongs[j].n = NTHREADS;
        belongs[j].N = N;
        belongs[j].tid = j;
        belongs[j].sft = sft;
        belongs[j].BinSizes = BinSizes;
        belongs[j].mcodes = morton_codes;
        rc = pthread_create(&thread[j], NULL, belong_wrapper, (void *)&belongs[j]);
    }

    for (j = 0; j < NTHREADS; j++){
        rc = pthread_join(thread[j], NULL);
    }


and defining these outside the recursive function

typedef struct{
    int n, N, tid, sft;
    int *BinSizes;
    unsigned long int *mcodes;
}belong;

pthread_mutex_t bin_mtx;

void * belong_wrapper(void *arg){
    int n, N, tid, sft, j;
    int *BinSizes;
    unsigned int ii;
    unsigned long int *mcodes;
    n = ((belong *)arg)->n;
    N = ((belong *)arg)->N;
    tid = ((belong *)arg)->tid;
    sft = ((belong *)arg)->sft;
    BinSizes = ((belong *)arg)->BinSizes;
    mcodes = ((belong *)arg)->mcodes;
    for (j = tid; j> sft) & 0x07;
        pthread_mutex_lock(&bin_mtx);
        BinSizes[ii]++;
        pthread_mutex_unlock(&bin_mtx);
    }

}


However it takes a lot more time than the serial one to execute... Why is this happening? What should I change?

Squirrel · Accepted Answer

Since you're using a single mutex to guard updates to the BinSizes array, you're still ultimately doing all the updates to this array sequentially: only one thread can call BinSizes[ii]++ at any given time. Basically you're still executing your function in sequence but incurring the extra overhead of creating and destroying threads.

There are several options I can think of for you (there are probably more):

do as @Chris suggests and make each thread update one portion of BinSizes. This might not be viable depending on the properties of the calculation you're using to compute ii.
Create multiple mutexes representing different partitions of BinSizes. For example, if BinSizes has 10 elements, you could create one mutex for elements 0-4, and another for elements 5-9, then use them in your thread something like so:
```
if (ii < 5) {
  mtx_index = 0;
} else {
  mtx_index = 1;
}
pthread_mutex_lock(&bin_mtx[mtx_index]);
BinSizes[ii]++;
pthread_mutex_unlock(&bin_mtx[mtx_index]);
```
You could generalize this idea to any size of BinSizes and any range: Potentially you could have a different mutex for each array element. Of course then you're opening yourself up to the overhead of creating each of these mutexes, and the possibility of deadlock if someone tries to lock several of them at once etc...
Finally, you could abandon the idea of parallelizing this block altogether: as other users have mentioned using threads this way is subject to some level of diminishing returns. Unless your BinSizes array is very large, you might not see a huge benefit to parallelization even if you "do it right".

Recursive function using pthreads in C

Answers (2)

Related Questions