Reputation: 201
I have a following simple problem: I need to create three threads and in every thread do a certain operation. The first thread need to add 100
to array[0]
and subtract 101
from array[1]
, second thread need to add 200
to array[1]
and subtract 201
from array[2]
and in the end third thread need to add 300
to array[2]
and subtract 301
from array[0]
.
Below is right solution but run time is very huge. If I carry out this task using one thread run time will be less 1 second, but three thread increase run time to approximately 10 sec(+- 2 sec). What is the problem? I thought that solution with three threads must be faster. May be I use mutex in wrong way?
#include <stdio.h>
#include <pthread.h>
#include <limits.h>
enum {SIZE = 3, ITER = 1000000};
double array[SIZE] = {};
pthread_t threads[SIZE];
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void *func(void *arg) {
int n = *(int *)arg;
int tmp = 100 * (n + 1),
tmp2 = tmp + 1;
for (int i = 0; i != ITER; ++i) {
pthread_mutex_lock(&mutex);
array[n % SIZE] += tmp;
array[(n + 1) % SIZE] -= tmp2;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
int n[SIZE];
for (int i = 0; i != SIZE; ++i) {
n[i] = i;
pthread_create(threads + i, NULL, func, n + i);
}
for (int i = 0; i != SIZE; ++i) {
pthread_join(threads[i], NULL);
}
for (int i = 0; i != SIZE; ++i) {
printf("%.10g ", array[i]);
}
printf("\n");
return 0;
}
Upvotes: 1
Views: 146
Reputation: 2878
Although mutex carries a significant overhead, that is not the reason for your slowdown. The reason is that the mutex serializes the execution. Only one thread can run at any point in time because only one can hold the mutex. So what you got in practice is a serial execution with an extra synchronization overhead.
You expected your threads to be running like this (X
represent a running thread).
Thread1: |X|X|X|X|X|
Thread2: |X|X|X|X|X|
Thread3: |X|X|X|X|X|
But instead, what you got is something like this:
Thread1: |X| | |X| | |X| | |X| | |X| | |
Thread2: | |X| | |X| | |X| | |X| | |X| |
Thread3: | | |X| | |X| | |X| | |X| | |X|
Where at each timeslot, only one thread runs.
There are several approaches for this:
You could partition your operations in a way that each thread access different indexes of the array exclusively. This way you could avoid using mutex at all. However, this will require a complete redesign of your solution.
You could make each thread work on a local copy of the array, then combine the results of all the threads on a single thread. This is much simpler because all you have to do is copy the data to the threads and get rid of the need for a mutex.
You could use a mutex for each array index, but that seems a bit extreme because of the high memory and time overhead, given that the collision probability is actually very low.
To conclude, using option 1 will yield the best performance, but using option 2 will yield a significant speedup compared to the serial version, with relatively little design effort. Option 3 is ill advised.
Upvotes: 5
Reputation: 896
When you use threads and mutexes you have to take into account the overhead. What is a overhead? I will give you an example:
Planes are much faster than cars, but the overhead of airport check-in and security lines make cars a better option for short distances.
So using the pthread_create
, pthread_mutex_lock(&mutex);
and pthread_mutex_unlock(&mutex);
creates overhead and in some cases it can make your program even slower than threadless program.
Upvotes: 2