Reputation: 1747
I am using OpenMP to parallelize a for loop. I am trying to access a C++ Armadillo vector by thread id, but I am wondering if I have to put the access in a critical section even if the different threads access disjoint areas of memory. This is my code:
#include <armadillo>
#include <omp.h>
#include <iostream>
int main()
{
arma::mat A = arma::randu<arma::mat>(1000,700);
arma::rowvec point = A.row(0);
arma::vec distances = arma::zeros(omp_get_max_threads());
#pragma omp parallel shared(A,point,distances)
{
arma::vec local_distances = arma::zeros(omp_get_num_threads());
int thread_id = omp_get_thread_num();
for(unsigned int l = 0; l < A.n_rows; l++){
double temp = arma::norm(A.row(l) - point,2);
if(temp > local_distances[thread_id])
local_distances[thread_id] = temp;
}
// Is it necessary to put a critical section here?
#pragma omp critical
if(local_distances[thread_id] > distances[thread_id]){
distances[thread_id] = local_distances[thread_id];
}
}
std::cout << distances[distances.index_max()] << std::endl;
}
Is it necessary to put read/writes to the distances
vector in my case?
Upvotes: 3
Views: 2243
Reputation: 22670
Your code is fine. It is important to understand that
shared
.private
- so each thread has a local copy of it.So it's not really useful to declare a private vector of distances for each thread. You don't even have to have a separate local_distances
since access to distances
is correct. (Although it should be noted that access to distances
is highly inefficient because different threads will try to write data on the same cache line). Anyway, the whole thing is called a reduction, and OpenMP has easy support for that. You can write that like follows:
arma::mat A = arma::randu<arma::mat>(1000,700);
arma::rowvec point = A.row(0);
double distance = 0.;
#pragma omp parallel reduction(max:distance)
{
for(unsigned int l = 0; l < A.n_rows; l++){
distance = std::max(distance, arma::norm(A.row(l) - point,2));
}
}
std::cout << distance << std::endl;
Declaring a variable reduction
means that each thread gets a local copy, and after the parallel region, the reduction operation is applied to the set of local copies. This is the most concise, idiomatic and performance optimal solution.
P.S. With C++ code, it can sometimes be a bit difficult to figure out whether an access e.g. though operator[]
or arma::mat::row
is safe in a multi-threaded program. You always have to figure out whether your code implies writing and/or reading from shared data. Only one thread may write exclusively or many threads may read.
Upvotes: 3
Reputation: 2699
The difficulty of multithreading comes from the need to deal with shared mutable state. There is nothing wrong with one thread accessing mutable (changeable) data, or many threads concurrently accessing immutable (constant) data. It is only when multiple threads need to access the same mutable data that synchronization/critical sections are necessary.
Your code falls under the first case, as each thread_id indexes to unique data--only one thread changes data at a time.
Upvotes: 2