Reputation: 737
I am trying to generate random number random numbers within the cuda kernel. I wish to generate the random numbers from uniform distribution and in the integer form, starting from 1 up to 8. The random numbers would be different for each of the threads. The range up to which random number can be generated would also vary from one thread to another. The maximum of the range in one thread might be as low as 2 or in the other thread it can be high as 8, but not higher than that. So, I am providing an example below of how I want the numbers to get generated :
In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2
In thread#2 --> maximum of the range is 6 and so the random number should be between 1 and 6
In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5
and so on...
Upvotes: 10
Views: 34307
Reputation: 301
For a safer general purpose random integer function using curand_uniform() that can handle larger integers:
#include <math.h>
int rand = (int)(ceil((curand_uniform(&state)*(RANGE + 1))) - 1);
Multiple your float by RANGE + 1
then take the ceiling, subtract by 1, and cast as an integer. Taking the ceiling produces a whole number between 1 and RANGE + 1
so when we subtract by one we get an integer between 0 and RANGE
.
Addition discussion:
If 0.0 were included in curand_uniform()
and 1.0 were not then,
(int)((curand_uniform(&state)*(RANGE + 1)));
would produce an integer between 0 and RANGE
. We are safe truncating to an integer because RANGE + 1
is not a possible result. We are also happy because the distribution includes our entire range.
Since 0.0 is excluded and 1.0 included then all possible results need to be shifted down by some amount to truncate to an integer safely. This is accomplished by adding .999999 to RANGE
and multiplying.
(int)((curand_uniform(&state)*(RANGE + .999999)))
The solution is not perfect however because not all possible values between 0 and RANGE
are represented (not considering 0 or RANGE
). This produces a slight bias against the greatest integer in our range.
The greatest offset according to IEEE 754 Floating Point is .999999940395355224609375
as this would be the largest decimal less than one before the computer rounds up. The problem with using this value is that the computer will start rounding up for values greater than 1 when the decimal part exceeds approximately .999999. In fact, our offset must shrink in proportion to the value of our integer because the integer part takes up more space in memory. For integers greater than 10000000 you would have to amend the solution since virtually all decimal parts will round up.
Upvotes: 1
Reputation: 151829
EDIT: I've edited my answer to fix some of the deficiencies pointed out in the other answers (@tudorturcu) and comments.
Something like this in your device code:
int idx = threadIdx.x+blockDim.x*blockIdx.x;
// assume have already set up curand and generated state for each thread...
// assume ranges vary by thread index
float myrandf = curand_uniform(&(my_curandstate[idx]));
myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999);
myrandf += min_rand_int[idx];
int myrand = (int)truncf(myrandf);
You should:
#include <math.h>
for truncf
Here's a fully worked example:
$ cat t527.cu
#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>
#include <math.h>
#include <assert.h>
#define MIN 2
#define MAX 7
#define ITER 10000000
__global__ void setup_kernel(curandState *state){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
curand_init(1234, idx, 0, &state[idx]);
}
__global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int, unsigned int *result){
int idx = threadIdx.x + blockDim.x*blockIdx.x;
int count = 0;
while (count < n){
float myrandf = curand_uniform(my_curandstate+idx);
myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999);
myrandf += min_rand_int[idx];
int myrand = (int)truncf(myrandf);
assert(myrand <= max_rand_int[idx]);
assert(myrand >= min_rand_int[idx]);
result[myrand-min_rand_int[idx]]++;
count++;}
}
int main(){
curandState *d_state;
cudaMalloc(&d_state, sizeof(curandState));
unsigned *d_result, *h_result;
unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int;
cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned));
h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned));
cudaMalloc(&d_max_rand_int, sizeof(unsigned));
h_max_rand_int = (unsigned *)malloc(sizeof(unsigned));
cudaMalloc(&d_min_rand_int, sizeof(unsigned));
h_min_rand_int = (unsigned *)malloc(sizeof(unsigned));
cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned));
setup_kernel<<<1,1>>>(d_state);
*h_max_rand_int = MAX;
*h_min_rand_int = MIN;
cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result);
cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost);
printf("Bin: Count: \n");
for (int i = MIN; i <= MAX; i++)
printf("%d %d\n", i, h_result[i-MIN]);
return 0;
}
$ nvcc -arch=sm_20 -o t527 t527.cu -lcurand
$ cuda-memcheck ./t527
========= CUDA-MEMCHECK
Bin: Count:
2 1665496
3 1668130
4 1667644
5 1667435
6 1665026
7 1666269
========= ERROR SUMMARY: 0 errors
$
Upvotes: 26
Reputation: 59
@Robert's example doesn't generate a perfectly uniform distribution (although all the numbers in the range are generated and all the generated numbers are in the range). Both the smallest and largest value have 0.5 the probability of being chosen of the rest of the numbers in the range.
At step 2, you should multiply with the number of values in the range: (largest value - smallest value + 0.999999). *
At step 3, the offset should be (+ smallest value) instead of (+ smallest value + 0.5).
Steps 1 and 4 remain the same.
*As @Kamil Czerski noted, 1.0 is included in the distribution. Adding 1.0 instead of 0.99999 would sometimes result in a number outside of the desired range.
Upvotes: 4