Reputation: 17
We have an unstructured tetrahedral mesh file containing following format:
element-ID nod1 nod2 nod3 nod4
1 452 3434 322 9000
2 2322 837 6673 2323
.
.
.
300000
So a C function
calc()
{
for (int i=1;i<=no-of-elements;i++)
{
n1=nod1[i];
n2=nod2[i];
n3=nod3[i];
n4=nod4[i];
ax[n1] += some code;
ax[n2] += some code;
ax[n3] += some code;
ax[n4] += some code;
}
}
How to implement the above code in CUDA on Tesla in a race-free condition manner or any other alternative way on CUDA?
Upvotes: 1
Views: 523
Reputation: 72349
The best solution is to use graph coloring to partition the mesh into subdomains. Each colour has the property that the elements in it can be assembled in parallel without memory races. Using this approach you only require as many passes through the mesh as there are colors in order to complete assembly.
There is a lot of literature available on parallel finite element assembly, and a number of very good graph partitioning codes available (for example Metis). Google scholar is probably the best place to start on learning about the technique.
Upvotes: 1