Reputation: 473
The idea is to write a program which accepts a number of random numbers to create, then divides the load between however many number of threads input by the user and measure the speed increase we get when using multiple threads. My issue; however, is that the more threads I add, the slower my program goes. Not sure what is wrong. Here is a snippet of my code thus far:
...
for (i=0; i<numThreads; i++){
vals *values;
values = (vals *)malloc(sizeof(vals));
values->randoms = count;
values->id = i;
pthread_create(&tid[i], NULL, run, (void *) values);
}
for (i=0; i<numThreads; i++)
pthread_join(tid[i], NULL);
timeElapsed = getMilliSeconds() - timeStart;
printf("Elapsed time: %lf seconds\n",(double)(timeElapsed/1000.0));
exit(EXIT_SUCCESS);
}
void *run(void *arg) {
vals *values;
long long int i;
long long int randoms;
values = (vals*)arg;
randoms = values->randoms;
srandom(values->id);
for (i = 0; i < randoms; i++) {
random();
}
pthread_exit(NULL);
}
vals is a struct which holds two int values (randoms and id). randoms contains the amount of random numbers to generate divided by the number of threads (to divide the load) and id holds a unique id for each thread to be used as a seed. I needed to create the struct so I could have multiple values passed to my worker function called by the thread.
Any ideas why it would run slower with more threads?
Upvotes: 1
Views: 204
Reputation: 133975
Quite possibly you're encountering false sharing. Generating a random number involves mutating some shared state, and multiple threads continually modifying the same values effectively eliminates any benefit you get from the CPU's memory cache. What happens is that every time Thread A wants to access that shared state, it has to wait for Thread B's CPU core to flush its cache. And any time Thread B wants to access it, it has to wait for Thread A's CPU core to flush its cache.
Looked at another way, a single threaded program would do something like:
Load state into CPU cache
for (i = 0 to randoms ...)
generate random number
With two threads, each one is doing this:
for (i = 0 to randoms ...)
wait for other CPU core to flush its cache
generate random number
My issue; however, is that the more threads I add, the slower my program goes.
If you have more processing threads than CPU cores, then your program is going to slow down. With two cores, the absolute best you can do with a compute-bound operation is to run twice as fast as the single-thread solution. If you have three threads then at some point the thread scheduler will have to stop one of the threads so that that third thread can get some time. These context switches take time--a relatively large amount of time in the context of a compute bound operation. In general, you don't want to have more compute-bound threads than CPU cores.
(Absent hyperthreading, of course. With hyperthreading, you could potentially have four threads running concurrently, although you're unlikely to get even a 3x improvement.)
Upvotes: 2
Reputation: 6003
A multi-threaded program may show improved performance in an environment where multiple CPUs are available. However, when there is a lack of CPU resources available, each thread will have to wait to be scheduled for CPU time. A 'context switch' is when one thread is switched out of a CPU, and another thread switched in. A 'context switch' is not an insignificant task.
Hence, the more threads, the more threads are waiting for CPU resources, and the more time the kernel spends doing context switches (instead of real work).
Upvotes: 2