K rakesh
K rakesh

Reputation: 17

Reduction code for CUDA GPU in finite element assembly

We have an unstructured tetrahedral mesh file containing following format:

element-ID  nod1 nod2 nod3 nod4

1            452  3434 322 9000

2           2322   837 6673 2323

.
.
.

300000

So a C function

calc()
{

   for (int i=1;i<=no-of-elements;i++)
   {
      n1=nod1[i];
      n2=nod2[i];
      n3=nod3[i];
      n4=nod4[i];

      ax[n1] += some code;

      ax[n2] += some code;

      ax[n3] += some code;

      ax[n4] += some code;

   }

}

How to implement the above code in CUDA on Tesla in a race-free condition manner or any other alternative way on CUDA?

Upvotes: 1

Views: 523

Answers (1)

talonmies
talonmies

Reputation: 72349

The best solution is to use graph coloring to partition the mesh into subdomains. Each colour has the property that the elements in it can be assembled in parallel without memory races. Using this approach you only require as many passes through the mesh as there are colors in order to complete assembly.

There is a lot of literature available on parallel finite element assembly, and a number of very good graph partitioning codes available (for example Metis). Google scholar is probably the best place to start on learning about the technique.

Upvotes: 1

Related Questions