Reputation: 12007
I have the code:
for (int i = 0; i < (int)(kpts.size()); i++) {
perform_operation(kpts1[i], *kpts2[i]);
}
where kpt1
and kpt2
are a std::vector<>
types. The function perform_operation
takes kpt1[i]
, performs an operation on it and stores it in kpt2[i]
.
It seems like I should be able to multithread this. Since each cycle of the for loop is independent of one another, then I should be able to run this parallely with as many processes as there are CPU
cores, right?
I've seem several SO questions kinda answering this, but they don't really get at how to parallelize a simple for loop; and I'm not sure if reading the same kpt1
variable and writing to the same kpt2
variable is possible.
Or am I misunderstanding something? - is this not parallelizable?
I'd be happy if I could find a solution in C++
or C
, but right now I am stuck.
Upvotes: 2
Views: 932
Reputation: 14481
I believe you're asking can you operate on each element of the array in a separate thread?
You can. There are several considerations though.
As long as the separate operations don't impact each other it's a good candidate for parallelism.
As a practical matter standard on CPU threading is slow to setup and eats up a good amount of memory (pthread by default allocates 32 megabytes per thread for the stack). If the tasks are pretty intensive then you get back the setup overhead in time savings. If not then it's both harder to code, bigger, and slower than doing it in a straight forward way.
Intel TBB is one option. NVidia CUDA is another
Upvotes: 1
Reputation: 25179
Provided each perform_operation
operates independently of each other, then ues, this is parallelizable.
Rather than simply calling perform_operation, start a new thread (with pthread_create
). You will need to wrap the parameters in a single struct
(could just be pointers to both arguments), and pass start_routine
as a wrapper around perform_operation
. That will create the relevant number of threads. Then in a second for
loop use pthread_join
to wait for the threads you have created to exit.
That's a rough outline. Obviously some error handling would be useful, and you might want each thread to perform a number of perform_operation
s serially, rather than one thread per item. But you should get the basic idea from the above.
Upvotes: 1