Reputation: 89
There is atomicAdd and atomicSub but it seems that atomicMul and atomicDiv don't exist! Is it possible? I need to implement the following code:
atomicMul(&accumulation[index],value)
How Can I do?
Upvotes: 2
Views: 3575
Reputation: 1
Kyungsu's answer was almost correct. On the line defining old == atomicCAS(...) though, he used __float_as_int when he should have used __int_as_float. I corrected his code below:
__device__ float atomicMul(float* address, float val){
//Implementation of atomic multiplication
//See https://stackoverflow.com/questions/43354798/atomic-multiplication-and-division
int* address_as_int = (int*)address;
int old = *address_as_int;
int assumed;
do {
assumed = old;
old = atomicCAS(address_as_int, assumed, __float_as_int(val * __int_as_float(assumed)));
} while (assumed != old);
return __int_as_float(old);}
Upvotes: 0
Reputation: 305
I'll supplement horus' answer based on what I understood about atomicCAS
. My answer can be wrong in detail, because I didn't look inside the atomicCAS
function but just read the documents about it (atomicCAS, Atomic Functions). Feel free to tackle my answer.
How atomicMul works
According to my understanding, the behavior of atomicCAS(int* address, int compare, int val)
is following.
*address
into old
(i.e old = *address
)(old == compare ? val : old)
to *address
. (At this point, the value of old
and *address
can be different depending on if the condition matched or not.)old
Understanding about its behavior gets better when we look at the atomicMul
function's definition together.
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int oldValue = *address_as_ull, assumed; // Modified the name 'old' to 'oldValue' because it can be confused with 'old' inside the atomicCAS.
do {
assumed = oldValue;
// other threads can access and modify value of *address_as_ull between upper and lower line.
oldValue = atomicCAS(address_as_ull, assumed, __double_as_longlong(val *
__longlong_as_double(assumed)));
} while (assumed != oldValue); return __longlong_as_double(oldValue);
What we want to do is read the value from address
(its value is eqaul to address_as_ull
), and multiply some value to it and then write it back. The problem is other threads can access and modify value of *address
between read, modify, and write.
To ensure there was no intercept of other threads, we check if the value of *address
is equal to what we assumed
to be there. Say that other thread modified value of *address
after assumed=oldValue
and oldValue = atomicCAS(...)
. The modified value of *address
will be copied to old
variable inside the atomicCAS
(see behavior 1. of atomicCAS
above).
Since atomicCAS
updates *address
according to *address = (old == compare ? val : old)
, *address
won't be changed (old==*address
).
Then atomicCAS
returns old
and it goes into oldValue
so that the loop can keep going and we can try another shot at next iteration. When *address
is not modified between read and write, then val
is written to the *address
and loop will end.
How to write it for float
short answer :
__device__ float atomicMul(float* address, float val)
{
int* address_as_int = (int*)address;
int old = *address_as_int, assumed;
do {
assumed = old;
old = atomicCAS(address_as_int, assumed, __float_as_int(val *
__float_as_int(assumed)));
} while (assumed != old); return __int_as_float(old);
}
I didn't test it, so there can be some errors. Fix me if I'm wrong.
How does it work :
For some reason, atomicCAS
only supports integer types. So we should manually convert float/double type variable into integer type to input to the function and then re-convert the integer result to float
/double
type. What I've modified above is double
to float
and unsigned long long
to int
because the size of float
matches to int
.
Upvotes: 2
Reputation: 89
Ok, I solved. But I cannot understand how atomicMul works and I don't know how to write it for floats.
#include <stdio.h>
#include <cuda_runtime.h>
__device__ double atomicMul(double* address, double val)
{
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val * __longlong_as_double(assumed)));
} while (assumed != old); return __longlong_as_double(old);
}
__global__ void try_atomicMul(double* d_a, double* d_out)
{
atomicMul(d_out,d_a[threadIdx.x]);
}
int main()
{
double h_a[]={5,6,7,8}, h_out=1;
double *d_a, *d_out;
cudaMalloc((void **)&d_a, 4 * sizeof(double));
cudaMalloc((void **)&d_out,sizeof(double));
cudaMemcpy(d_a, h_a, 4 * sizeof(double),cudaMemcpyHostToDevice);
cudaMemcpy(d_out, &h_out, sizeof(double),cudaMemcpyHostToDevice);
dim3 blockDim(4);
dim3 gridDim(1);
try_atomicMul<<<gridDim, blockDim>>>(d_a,d_out);
cudaMemcpy(&h_out, d_out, sizeof(double), cudaMemcpyDeviceToHost);
printf("%f \n",h_out);
cudaFree(d_a);
return 0;
}
Upvotes: 2