Reputation: 483
I am writing an image processing filter, and I want to speed up the computations using openmp. My pseudo-code structure follows like this:
for(every pixel in the image){
//do some stuff here
for(any combination of parameters){
//do other stuff here and filter
}
}
The code is filtering every pixel using different parameters, and choosing the optimal ones.
My question is what is faster: to parallelize the first loop among the processors, or to access sequentially the pixels and parallelize the different parameters selection.
I think the question could be a more general one: what is faster, giving big amounts of operations to every thread, or creating many threads with few operations.
I don't care for now about the implementation details, and I think I can handle them with my previous expertise using openmp. Thanks!
Upvotes: 4
Views: 1884
Reputation: 8986
Your goal is to distribute the data evenly over the available processors. You should split the image up (outer loop) evenly with one thread per processor core. Experiment with fine and coarse grain parallelism to see what gives the best results. Once your number of threads exceed the number of cores available you will start to see performance degradation.
Upvotes: 4
Reputation: 507
what is faster, giving big amounts of operations to every thread, or creating many threads with few operations
Creating a new thread requires a lot of time and resources so it's better to create few threads with longer tasks.
It also depends on your algorithm: if you access the disk/memory too often the threads will be suspended frequently so it would be better to use a few more threads.
Upvotes: 4
Reputation: 931
There tends to be substantial overheard in thread creation and scheduling. In general you want to give each thread enough work that the overhead from creating a new thread is a absorbed by the "win" of introducing multithreading.
Additionally, assuming you have sufficiently many pixels, it's a good idea to make sure each thread accesses pixels sequentially. Better for caching at the OS level, and ensuring that the data is where you want it to be already. Loading from memory repeatedly will hurt your parallelization win, too.
Upvotes: 4