Reputation:
I'm trying to use local pointers to access memory that the current thread has affinity for.
Unfortunately, my local pointers don't seem to point where I think they should.
Anyone have an idea what is going wrong?
Edit: I forgot to mention that the output below is generated running this code with four threads, i.e. THREADS = 4
.
My code:
#include <upc.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
shared int * T = (shared int *) upc_all_alloc(12, sizeof(int));
if(!T)
upc_global_exit(-1);
int i;
upc_forall(i=0; i<12; i++; &T[i]) T[i] = i;
upc_barrier;
if(MYTHREAD == 0)
for(i=0; i<12; i++) printf("thread %d, T[%d] = %d\n", MYTHREAD, i, T[i]);
upc_barrier;
int my_start = (12/THREADS + 1)*MYTHREAD;
int my_end = (12/THREADS + 1)*(MYTHREAD+1) - 1;
int* T_local = (int*)&T[my_start];
for(i=my_start; i<=my_end; i++)
printf("thread %d, T_local[%d] = %d, T[%d] = %d\n", MYTHREAD,
i-my_start, T_local[i-my_start], i, T[i]);
upc_barrier;
return 0;
}
The output (THREADS = 4
):
thread 0, T[0] = 0
thread 0, T[1] = 1
thread 0, T[2] = 2
thread 0, T[3] = 3
thread 0, T[4] = 4
thread 0, T[5] = 5
thread 0, T[6] = 6
thread 0, T[7] = 7
thread 0, T[8] = 8
thread 0, T[9] = 9
thread 0, T[10] = 10
thread 0, T[11] = 11
thread 0, T_local[0] = 0, T[0] = 0
thread 0, T_local[1] = 4, T[1] = 1
thread 0, T_local[2] = 8, T[2] = 2
thread 0, T_local[3] = 0, T[3] = 3
thread 1, T_local[0] = 4, T[4] = 4
thread 1, T_local[1] = 8, T[5] = 5
thread 1, T_local[2] = 0, T[6] = 6
thread 2, T_local[0] = 8, T[8] = 8
thread 2, T_local[1] = 0, T[9] = 9
thread 2, T_local[2] = 0, T[10] = 10
thread 2, T_local[3] = 0, T[11] = 11
thread 3, T_local[0] = 0, T[12] = 0
thread 3, T_local[1] = 0, T[13] = 0
thread 3, T_local[2] = 0, T[14] = 0
thread 3, T_local[3] = 0, T[15] = 0
thread 1, T_local[3] = 0, T[7] = 7
Upvotes: 2
Views: 57
Reputation: 2487
Your array T is allocated and declared with a cyclic layout (ie blocksize == 1). This means the first element with affinity to MYTHREAD is simply T[MYTHREAD]. Therefore you should probably initialize your pointer-to-local as follows:
int* T_local = (int*)&T[MYTHREAD];
In a cyclic layout the shared elements are passed out round-robin to the threads, which means each thread has a non-contiguous block of the distributed array elements. So for example with 4 threads, thread 0 will have affinity to T[0], T[4], and T[8]. The correctly-initialized T_local pointer-to-local on thread 0 will access these elements in its local slice of the shared array (as T_local[0], T_local[1] and T_local[2], respectively).
Your computation of my_start and my_end seem to be assuming a different (larger) blocking factor than what T is actually using, which is probably the source of your confusion.
Upvotes: 1