GeauxEric
GeauxEric

Reputation: 3070

local pointers in OpenMP

Local variables should be automatically private to each thread. How about a local pointer pointing to some address outside the parallel region, like

A * a = new A[10];
int i, j;

for (i = 0; i < 10; i++){
    A * local_i = &a[i];
    // do sth ...
    #pragma omp parallel for
    for (j = 0; j < 10; j++){
        A * local_j = &a[j];
        local_j->x = 1.0f;
        // ...
    }
}
delete[]a;

should I make local_a private and a fisrtprivate? I am new to OpenMP and C actually.

Upvotes: 1

Views: 6952

Answers (2)

Z boson
Z boson

Reputation: 33699

It's important to know that OpenMP treats static and dynamic arrays differently. In the example you gave a static array is more appropriate. Let's look at what happens when you use shared, private, and firstprivate on static and dynamic arrays. I will print the thread number, address of a, the value of a, and the values of the array for each case.

Static arrays:

int a[10];
for(int i=0; i<10; i++) a[i]=i;

#pragma omp parallel
{
    #pragma omp critical
    {
        printf("ithread %d %p %p :", omp_get_thread_num(), &a, a); for(int i=0; i<10; i++) printf("%d ", a[i]); printf("\n");             
    }
}
//ithread 1 0x7fff3f43f9b0 0x7fff3f43f9b0 :0 1 2 3 4 5 6 7 8 9 
//ithread 3 0x7fff3f43f9b0 0x7fff3f43f9b0 :0 1 2 3 4 5 6 7 8 9 

Notice that each thread has the same address for a. Now let's try passing a as private.

#pragma omp parallel private(a)
//ithread 0 0x7fffc7897d60 0x7fffc7897d60 :4 0 -1393351936 2041147031 4 0 0 0 4196216 0 
//ithread 1 0x7fa65f275df0 0x7fa65f275df0 :0 0 0 0 0 0 1612169760 32678 1596418496 32678 

Now each thread has its only private a and that each private version points to a different memory address. However, they values of the array were NOT copied. Now let's try firstprivate(a)

#pragma omp parallel firstprivate(a)
//ithread 0 0x7ffffb5ba860 0x7ffffb5ba860 :0 1 2 3 4 5 6 7 8 9 
//ithread 3 0x7f50a8272df0 0x7f50a8272df0 :0 1 2 3 4 5 6 7 8 9 

The only difference now is that the values of a ARE copied.

Dynamic arrays:

int *a = new int[10];
for(int i=0; i<10; i++) a[i]=i;

Let's first look at passing a as shared

#pragma omp parallel
//ithread 2 0x7fff86a02cc8 0x9ff010 :0 1 2 3 4 5 6 7 8 9 
//ithread 0 0x7fff86a02cc8 0x9ff010 :0 1 2 3 4 5 6 7 8 9

Each thread has the the same a just like a static array. The difference happens when we use private.

#pragma omp parallel private(a)
//segmentation fault

Each thread gets its own private a just like what a static array but the memory address each version points to is unallocated random memory. When we try to read it we get a segmentation fault. We can fix this using firstprivate(a)

 #pragma omp parallel firstprivate(a)
 //ithread 0 0x7fff2baa2b48 0x8bd010 :0 1 2 3 4 5 6 7 8 9 
 //ithread 1 0x7f3031fc5e28 0x8bd010 :0 1 2 3 4 5 6 7 8 9

Now we see that each thread has its own private a, however, unlike static arrays each still points to the SAME memory address. So the pointers are private but the addresses they point to are the same. This effectively means the memory is still shared.

How to allocate private version of dynamic arrays

To get private versions of dynamic arrays for each thread I don't recommend allocating them outside of the parallel region. The reason is that if you're not careful it's easy to cause false sharing. See this question/answer about false sharing caused by allocating the memory outside of the parallel region OpenMP implementation of reduction. You could use a double pointer but that does not necessarily fix the false sharing issue and it won't fix another problem on multi-socket systems. On these systems it's important that sockets don't share the same page (or you get another kind of false sharing). If you let each thread allocate its memory you don't have to worry about this.

In general I would allocate private version of an array for each thread and then merge them in a critical section. However, there are cases to allocate only once but it's complicated to do right Fill histograms (array reduction) in parallel with OpenMP without using a critical section.

Upvotes: 11

ROTA
ROTA

Reputation: 111

It depends on what you want to do with the array. Should each thread access the same array, than you don't need to set the pointer to threadprivate.

In your case it does not matter if you set a private or not, because the pointer only stores the address to the memory. It makes no difference if all thread use the same variable to access the same address or if each one has its own copy of the same address.

If each thread should have its own copy of the array, then you need to allocate and delete the memory for each thread separately. If your compiler supports the calling of the right constructor for created C++ Objects for thread-private objects, you can use a std::vector in your case.

std::vector<A> a;
#pragma omp parallel for firstprivate(a)
for (j = 0; j < 10; j++)
{
    A * local_j = &a[j];
    local_j->x = 1.0f;
    // ...
}

Otherwise you need to allocate for each thread a separate memory block.

const int num_threads = omp_get_num_threads(); 
A** a;
a = new A*[num_threads];
for (int i=0; i<num_threads; i++)
  a[i] = new A[10];

#pragma omp parallel for firstprivate(a)
for (j = 0; j < 10; j++)
{
    const int thread_id = omp_get_thread_num();
    A * local_j = &a[thread_id][j];
    local_j->x = 1.0f;
    // ...
}

for (int i=0; i<num_threads; i++)
  delete [] a[i];
delete [] a;

Upvotes: 4

Related Questions