Nyaruko
Nyaruko

Reputation: 4459

Cuda AtomicAdd for int3

In Cuda AtomicAdd for double can be implemented using a while loop and AtomicCAS operation. But how could I implement an atomic add for type int3 efficiently?

Upvotes: 3

Views: 354

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

After further consideration, I'm not sure how an atomicAdd on an int3 would be any different than 3 separate atomicAdd operations, each on an int location. Why not do that?

(An int3 cannot be loaded as a single quantity anyway in CUDA at the machine level. The compiler is guaranteed to split that into multiple loads, so although there would be a hazard to asynchronously read the int3, that hazard would be there anyway, with or without atomics.)

But to answer the specific question you asked, it's not possible using atomics.

int3 is a 96-bit type.

CUDA atomics support operations up to 64 bits only. Here is an atomic add example for float2 (a 64-bit type) and you could do something similar for up to e.g. short3 or short4.

You could alternatively use a reduction method or else a critical section. There are plenty of questions here on the SO cuda tag that discuss reductions and critical sections.

A reduction method could be implemented as follows:

  1. Each thread that wants to make an atomic update to a particular int3 location uses this method to create a queue or list of the atomic update quantities.

  2. Once the list generation is complete, launch a kernel to do a parallel reduction on the list, so as to produce the final reduced quantity that belongs in that location.

Upvotes: 2

Related Questions