Reputation: 17
We have an unstructured tetrahedral mesh file containing following format:
element-ID nod1 nod2 nod3 nod4
1 452 3434 322 9000
2 2322 837 6673 2323
.
.
.
300000
We partitioned the above mesh for partition size of 2048 each. For each partition size of 2048 contains unique nod1 nod2 nod3 nod4 values, we pass 1 block and 512 threads at different start index.
In a cuda file, we have
__global__ void calc(double d_ax,int *nod1,int *node2,int *nod3,int *nod4,int start,int size)
{
int n1,n2,n3,n4;
int i = blockIdx.x * blockDim.x + threadIdx.x + start;
if ( i < size )
{
n1=nod1[i];
n2=nod2[i];
n3=nod3[i];
n4=nod4[i];
ax[n1] += some code;
ax[n2] += some code;
ax[n3] += some code;
ax[n4] += some code;
}
}
We call the kernel as
calc<<<1,512>>>(d_ax,....,0,512);
calc<<<1,512>>>(d_ax,....,512,512);
calc<<<1,512>>>(d_ax,....,1024,512);
calc<<<1,512>>>(d_ax,....1536,512);
the above code works well but the problem is we get different results using more than one block at a time. For example:
calc<<<2,512>>>(d_ax,....,0,1024);
calc<<<2,512>>>(d_ax,....,1024,1024);
Can anyone help me?
Upvotes: 0
Views: 303
Reputation: 21778
For each partition size of 2048 contains unique nod1 nod2 nod3 nod4 values
but in two partition sets the same node index can appear? If in two different blocks you have
Block 1:
ax[1234]=do something
Block 2:
ax[1234]=do something else
it smells as a race condition. You never know which one of the two blocks will be faster to write....
Upvotes: 0
Reputation: 72348
I am not sure how you expect anyone to tell you what might be wrong when the code you have posted is incomplete and uncompilable, but if in your single block case you really are calling the kernel as you have posted, this is what should happen:
calc<<<1,512>>>(d_ax,....,0,512); // process first 512 elements
calc<<<1,512>>>(d_ax,....,512,512); // start >= 512, size == 512, does nothing
calc<<<1,512>>>(d_ax,....,1024,512); // start >= 1024, size == 512, does nothing
calc<<<1,512>>>(d_ax,....1536,512); // start >= 1536, size == 512, does nothing
So irrespective of whether your code might be broken when run using multiple blocks, your results for the single block case are probably wrong, and the whole point of your question is probably irrelevant as a result.
If you want a better answer, edit your question so it contains a complete description of the problem and concise, complete code that could actually be compiled. Otherwise this is about as much as anybody could guess from the information you have provided.
Upvotes: 1