Reputation: 80
I'm currently creating an algorithm for solving the 0-1 knapsack problem using dynamic programming with multiple threads. My approach is to solve the n
by capacity
dynamic programming table by dividing it into 4 n
by capacity / 4
parts which will be solved by 4 threads. Because of the knapsack having a dependency on the upper row, all the 4 threads need to be solving the same row every time. They must wait for all the threads to finish their parts on the current row before proceeding to the next row.
Here is my code:
int knapsackParallel(char *values, char *weights, int N, int capacity) {
for (int j = 0; j < 4; j++) {
arguments[j].cpuNum = j;
pthread_create(&tid[j], NULL, solveRow, (void *) &arguments[j]);
pthread_setaffinity_np(tid[j], sizeof(cpu_set_t), &cpusets[j]);
}
for (int j = 0; j < 4; j++) {
pthread_join(tid[j], NULL);
}
}
Here is the code of each thread that needs synchronization:
void *solveRow(void *arguments) {
int cpuNum = ((args *) arguments)->cpuNum;
int initial = ((args *) arguments)->cpuNum * (capacity / p);
int limit = cpuNum == p - 1 ? capacity + 1 : initial + (capacity / p);
for (int i = 0; i < N + 1; i++) {
for (int j = initial; j < limit; j++) {
// for the top and left edges, initialize to 0
if (i == 0 || j == 0)
table[i][j] = 0;
// find the max between putting the value in the knapsack or not
else if (weights[i] <= j)
table[i][j] = fmax(values[i] + table[i - 1][j - weights[i]], table[i - 1][j]);
// copy the value in the top of the cell
else
table[i][j] = table[i - 1][j];
}
// insert code here to wait for all the threads to finish their current iteration
// before proceeding to the next iteration
}
What I have tried so far:
Using multiple semaphores:
sem_post(&locks[cpuNum]);
sem_post(&locks[cpuNum]);
sem_post(&locks[cpuNum]);
for (int j = 0; j < p; j++) {
if (j == cpuNum) continue;
sem_wait(&locks[j]);
}
printf("thread %d done\n", cpuNum);
Using pthread_cond_wait
:
locks[cpuNum] = 1;
if (locks[0] && locks[1] && locks[2] && locks[3]) {
pthread_cond_broadcast(&full);
}
else {
pthread_cond_wait(&full, &lock);
}
locks[cpuNum] = 0;
Using futex
:
locks[cpuNum] = 1;
futex_wait(&locks[0], 0);
futex_wait(&locks[1], 0);
futex_wait(&locks[2], 0);
futex_wait(&locks[3], 0);
locks[cpuNum] = 0;
Upvotes: 0
Views: 75
Reputation: 385917
[Another answer has brought barriers to my attention. Using a barrier greatly reduces the code needed at no cost. Best to use that.]
If you had a var with the active row number, the threads could simply wait until that's updated.
#define NUM_WORKERS 4
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
static int active_row_num = 0;
static int outstanding_threads = NUM_WORKERS;
static void done_row( int row_num ) {
pthread_mutex_lock( &mutex );
if ( --outstanding_threads == 0 ) {
++active_row_num;
outstanding_threads = NUM_WORKERS;
pthread_cond_broadcast( &cond );
}
++row_num;
while ( row_num != active_row_num )
pthread_cond_wait( &cond, &mutex );
pthread_mutex_unlock( &mutex );
}
static void *worker( void *arg ) {
for ( int row_num=0; row_num<...; ++row_num ) {
...
done_row( row_num );
}
return NULL;
}
Upvotes: 1
Reputation: 58598
This sounds like a job for posix barriers: pthread_barrier_init
, pthread_barrier_wait
.
A barrier can solve some of the same problems as pthread_join
, but without actually having threads terminate and start again.
A barrier is initialized with a certain count. When that many threads call the wait operation, the barrier activates, and they are all unblocked. One of the threads receives a distinct return value, telling it it is the "serial thread": it can then perform some special job.
Thus each barrier_wait
point in the program's logic is a rendezvous where all threads meet, which guarantees that whatever activity they were cooperating on is in a completed state. If their activity needs to be integrated together (e.g. partial results combined into a full result), the serial thread can do that, followed by another barrier where all the threads rendezvous again, to kick off another bout of parallel work.
Upvotes: 2
Reputation: 76601
You seem to be doing well by calling pthread_join for the threads you aim to synchronize, because that assures that your main thread will wait until the other threads are done:
The problem seems to be that you are not calling pthread_exit in the threads that are to be synchronized:
Upvotes: 0