user2042696
user2042696

Reputation:

UPC Local Pointers Access Random Memory

I'm trying to use local pointers to access memory that the current thread has affinity for.

Unfortunately, my local pointers don't seem to point where I think they should.

Anyone have an idea what is going wrong?

Edit: I forgot to mention that the output below is generated running this code with four threads, i.e. THREADS = 4.

My code:

#include <upc.h>
#include <stdio.h>
#include <stdlib.h>

int main(){

    shared int * T = (shared int *) upc_all_alloc(12, sizeof(int));
    if(!T)
        upc_global_exit(-1);

int i;
upc_forall(i=0; i<12; i++; &T[i]) T[i] = i;
upc_barrier;

if(MYTHREAD == 0)
    for(i=0; i<12; i++) printf("thread %d, T[%d] = %d\n", MYTHREAD, i, T[i]);
upc_barrier;

int my_start = (12/THREADS + 1)*MYTHREAD;
int my_end = (12/THREADS + 1)*(MYTHREAD+1) - 1;

int* T_local = (int*)&T[my_start];

for(i=my_start; i<=my_end; i++)
    printf("thread %d, T_local[%d] = %d, T[%d] = %d\n", MYTHREAD, 
            i-my_start, T_local[i-my_start], i, T[i]);
upc_barrier;

return 0;
}

The output (THREADS = 4):

thread 0, T[0] = 0
thread 0, T[1] = 1
thread 0, T[2] = 2
thread 0, T[3] = 3
thread 0, T[4] = 4
thread 0, T[5] = 5
thread 0, T[6] = 6
thread 0, T[7] = 7
thread 0, T[8] = 8
thread 0, T[9] = 9
thread 0, T[10] = 10
thread 0, T[11] = 11
thread 0, T_local[0] = 0, T[0] = 0
thread 0, T_local[1] = 4, T[1] = 1
thread 0, T_local[2] = 8, T[2] = 2
thread 0, T_local[3] = 0, T[3] = 3
thread 1, T_local[0] = 4, T[4] = 4
thread 1, T_local[1] = 8, T[5] = 5
thread 1, T_local[2] = 0, T[6] = 6
thread 2, T_local[0] = 8, T[8] = 8
thread 2, T_local[1] = 0, T[9] = 9
thread 2, T_local[2] = 0, T[10] = 10
thread 2, T_local[3] = 0, T[11] = 11
thread 3, T_local[0] = 0, T[12] = 0
thread 3, T_local[1] = 0, T[13] = 0
thread 3, T_local[2] = 0, T[14] = 0
thread 3, T_local[3] = 0, T[15] = 0
thread 1, T_local[3] = 0, T[7] = 7

Upvotes: 2

Views: 57

Answers (1)

Dan Bonachea
Dan Bonachea

Reputation: 2487

Your array T is allocated and declared with a cyclic layout (ie blocksize == 1). This means the first element with affinity to MYTHREAD is simply T[MYTHREAD]. Therefore you should probably initialize your pointer-to-local as follows:

int* T_local = (int*)&T[MYTHREAD];

In a cyclic layout the shared elements are passed out round-robin to the threads, which means each thread has a non-contiguous block of the distributed array elements. So for example with 4 threads, thread 0 will have affinity to T[0], T[4], and T[8]. The correctly-initialized T_local pointer-to-local on thread 0 will access these elements in its local slice of the shared array (as T_local[0], T_local[1] and T_local[2], respectively).

Your computation of my_start and my_end seem to be assuming a different (larger) blocking factor than what T is actually using, which is probably the source of your confusion.

Upvotes: 1

Related Questions