Cauchy
Cauchy

Reputation: 1747

OpenMP: Do I need a critical section when I access shared variable by thread id

I am using OpenMP to parallelize a for loop. I am trying to access a C++ Armadillo vector by thread id, but I am wondering if I have to put the access in a critical section even if the different threads access disjoint areas of memory. This is my code:

#include <armadillo>
#include <omp.h>
#include <iostream>

int main()
{

        arma::mat A = arma::randu<arma::mat>(1000,700);
        arma::rowvec point = A.row(0);
        arma::vec distances = arma::zeros(omp_get_max_threads());

        #pragma omp parallel shared(A,point,distances)
        {

                arma::vec local_distances = arma::zeros(omp_get_num_threads());
                int thread_id = omp_get_thread_num();

                for(unsigned int l = 0; l < A.n_rows; l++){
                        double temp = arma::norm(A.row(l) - point,2);
                        if(temp > local_distances[thread_id])
                                local_distances[thread_id] = temp;
                }

                // Is it necessary to put a critical section here?
                #pragma omp critical 
                if(local_distances[thread_id] > distances[thread_id]){
                        distances[thread_id] = local_distances[thread_id];
                }

        }

        std::cout << distances[distances.index_max()] << std::endl;

}

Is it necessary to put read/writes to the distances vector in my case?

Upvotes: 3

Views: 2243

Answers (2)

Zulan
Zulan

Reputation: 22670

Your code is fine. It is important to understand that

  • Variables declared outside of the parallel region are implicitly shared.
  • Variables declared inside of the parallel region are implicitly private - so each thread has a local copy of it.

So it's not really useful to declare a private vector of distances for each thread. You don't even have to have a separate local_distances since access to distances is correct. (Although it should be noted that access to distances is highly inefficient because different threads will try to write data on the same cache line). Anyway, the whole thing is called a reduction, and OpenMP has easy support for that. You can write that like follows:

arma::mat A = arma::randu<arma::mat>(1000,700);
arma::rowvec point = A.row(0);
double distance = 0.;
#pragma omp parallel reduction(max:distance)
{
        for(unsigned int l = 0; l < A.n_rows; l++){
                distance = std::max(distance, arma::norm(A.row(l) - point,2));
        }
}
std::cout << distance << std::endl;

Declaring a variable reduction means that each thread gets a local copy, and after the parallel region, the reduction operation is applied to the set of local copies. This is the most concise, idiomatic and performance optimal solution.

P.S. With C++ code, it can sometimes be a bit difficult to figure out whether an access e.g. though operator[] or arma::mat::row is safe in a multi-threaded program. You always have to figure out whether your code implies writing and/or reading from shared data. Only one thread may write exclusively or many threads may read.

Upvotes: 3

joshwilsonvu
joshwilsonvu

Reputation: 2699

The difficulty of multithreading comes from the need to deal with shared mutable state. There is nothing wrong with one thread accessing mutable (changeable) data, or many threads concurrently accessing immutable (constant) data. It is only when multiple threads need to access the same mutable data that synchronization/critical sections are necessary.

Your code falls under the first case, as each thread_id indexes to unique data--only one thread changes data at a time.

Upvotes: 2

Related Questions