Reputation: 17294
I've just started trying some CUDA programming, coming from OpenGL/GLSL.
In OpenGL, atomic counters appear to be separate from main graphics memory an have next to zero overhead (unlike the significantly slower, but not exactly slow, atomic operations on image units or "bindless graphics" memory). They are limited in that there are a fixed number of them (~16k) and they can ONLY be read, incremented or decremented, which I guess has lower overhead.
Is there an equivalent interface to these extremely fast atomic counters in CUDA?
I want to write something like this:
if (some_condition)
{
index = atomicIncrement(globalCounter);
output[index] = myValue;
}
The same result could be accomplished with a radix sort or "histopyramid"-like compaction but atomic counters are just simpler.
Upvotes: 2
Views: 678
Reputation: 21128
Have you tried using atomicAdd()? I don't know about the OpenGL atomics but I'd imagine they're similar.
int atomicAdd(int* address, int val);
unsigned int atomicAdd(unsigned int* address,
unsigned int val);
unsigned long long int atomicAdd(unsigned long long int* address,
unsigned long long int val);
float atomicAdd(float* address, float val);
Upvotes: 1