horus
horus

Reputation: 89

Atomic multiplication and division?

There is atomicAdd and atomicSub but it seems that atomicMul and atomicDiv don't exist! Is it possible? I need to implement the following code:

 atomicMul(&accumulation[index],value)

How Can I do?

Upvotes: 2

Views: 3575

Answers (3)

Fernando Zigunov
Fernando Zigunov

Reputation: 1

Kyungsu's answer was almost correct. On the line defining old == atomicCAS(...) though, he used __float_as_int when he should have used __int_as_float. I corrected his code below:

__device__ float atomicMul(float* address, float val){
//Implementation of atomic multiplication
//See https://stackoverflow.com/questions/43354798/atomic-multiplication-and-division
int* address_as_int = (int*)address;
int old = *address_as_int;
int assumed;
do {
    assumed = old;
    old = atomicCAS(address_as_int, assumed, __float_as_int(val * __int_as_float(assumed)));
} while (assumed != old); 
return __int_as_float(old);}

Upvotes: 0

Kyungsu Stanley Kim
Kyungsu Stanley Kim

Reputation: 305

I'll supplement horus' answer based on what I understood about atomicCAS. My answer can be wrong in detail, because I didn't look inside the atomicCAS function but just read the documents about it (atomicCAS, Atomic Functions). Feel free to tackle my answer.

How atomicMul works

According to my understanding, the behavior of atomicCAS(int* address, int compare, int val) is following.

  1. Copy *address into old (i.e old = *address)
  2. Store (old == compare ? val : old) to *address. (At this point, the value of old and *address can be different depending on if the condition matched or not.)
  3. Return old

Understanding about its behavior gets better when we look at the atomicMul function's definition together.

unsigned long long int* address_as_ull = (unsigned long long int*)address; 
unsigned long long int oldValue = *address_as_ull, assumed; // Modified the name 'old' to 'oldValue' because it can be confused with 'old' inside the atomicCAS. 
do { 
  assumed = oldValue; 
  // other threads can access and modify value of *address_as_ull between upper and lower line. 
  oldValue = atomicCAS(address_as_ull, assumed, __double_as_longlong(val * 
                       __longlong_as_double(assumed))); 
} while (assumed != oldValue); return __longlong_as_double(oldValue);

What we want to do is read the value from address(its value is eqaul to address_as_ull), and multiply some value to it and then write it back. The problem is other threads can access and modify value of *address between read, modify, and write.

To ensure there was no intercept of other threads, we check if the value of *address is equal to what we assumed to be there. Say that other thread modified value of *address after assumed=oldValue and oldValue = atomicCAS(...). The modified value of *address will be copied to old variable inside the atomicCAS(see behavior 1. of atomicCAS above). Since atomicCAS updates *address according to *address = (old == compare ? val : old), *address won't be changed (old==*address).

Then atomicCAS returns old and it goes into oldValue so that the loop can keep going and we can try another shot at next iteration. When *addressis not modified between read and write, then val is written to the *address and loop will end.

How to write it for float

short answer :

__device__ float atomicMul(float* address, float val) 
{ 
  int* address_as_int = (int*)address; 
  int old = *address_as_int, assumed; 
  do { 
    assumed = old; 
    old = atomicCAS(address_as_int, assumed, __float_as_int(val * 
__float_as_int(assumed))); 
 } while (assumed != old); return __int_as_float(old);
}

I didn't test it, so there can be some errors. Fix me if I'm wrong.

How does it work : For some reason, atomicCAS only supports integer types. So we should manually convert float/double type variable into integer type to input to the function and then re-convert the integer result to float/double type. What I've modified above is double to float and unsigned long long to int because the size of float matches to int.

Upvotes: 2

horus
horus

Reputation: 89

Ok, I solved. But I cannot understand how atomicMul works and I don't know how to write it for floats.

#include <stdio.h>
#include <cuda_runtime.h>

__device__ double atomicMul(double* address, double val) 
{ 
  unsigned long long int* address_as_ull = (unsigned long long int*)address; 
  unsigned long long int old = *address_as_ull, assumed; 
  do { 
 assumed = old; 
 old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val * __longlong_as_double(assumed))); 
 } while (assumed != old); return __longlong_as_double(old);
}      
__global__ void try_atomicMul(double* d_a, double* d_out)
{
     atomicMul(d_out,d_a[threadIdx.x]);
} 
int main()
{
  double h_a[]={5,6,7,8}, h_out=1;
  double *d_a, *d_out;

 cudaMalloc((void **)&d_a, 4 * sizeof(double));
 cudaMalloc((void **)&d_out,sizeof(double));

 cudaMemcpy(d_a, h_a, 4 * sizeof(double),cudaMemcpyHostToDevice);
 cudaMemcpy(d_out, &h_out, sizeof(double),cudaMemcpyHostToDevice);

 dim3 blockDim(4);
 dim3 gridDim(1);

  try_atomicMul<<<gridDim, blockDim>>>(d_a,d_out);
 cudaMemcpy(&h_out, d_out, sizeof(double), cudaMemcpyDeviceToHost);

 printf("%f \n",h_out);
 cudaFree(d_a);
 return 0;
}

Upvotes: 2

Related Questions