Reputation: 152
I want to generate pseudo-random numbers on a CUDA device in a deterministic way, saying if I ran the program two times I expect the exact same results, given that the program uses a hardcoded seed. Following the examples provided by nvidia: https://docs.nvidia.com/cuda/curand/device-api-overview.html#device-api-example I would expect exactly the described behavior.
But I do get different results, running the exact same code multiple times. Is there a way to get pseudo-random numbers in a deterministic way, as I described?
Following example code shows my problem:
#include <iostream>
#include <cuda.h>
#include <curand_kernel.h>
__global__ void setup_kernel(curandState *state)
{
auto id = threadIdx.x + blockIdx.x * blockDim.x;
curand_init(123456, id, 0, &state[id]);
}
__global__ void draw_numbers(curandState *state, float* results)
{
auto id = threadIdx.x + blockIdx.x * blockDim.x;
// Copy state
curandState localState = state[id % 1024];
// Generate random number
results[id] = curand_uniform(&localState);
// Copy back state
state[id % 1024] = localState;
}
int main(int argc, char* argv[])
{
// Setup
curandState* dStates;
cudaMalloc((void **) &dStates, sizeof(curandState) * 1024);
setup_kernel<<<1024, 1>>>(dStates);
// Random numbers
float* devResults;
cudaMalloc((void **) &devResults, sizeof(float) * 16 * 1024);
float *hostResults = (float*) calloc(16 * 1024, sizeof(float));
// Call draw random numbers
draw_numbers<<<1024, 16>>>(dStates, devResults);
// Copy results
cudaMemcpy(hostResults, devResults, 16 * 1024 * sizeof(float), cudaMemcpyDeviceToHost);
// Output number 12345
::std::cout << "12345 is: " << hostResults[12345] << ::std::endl;
return 0;
}
Compiling and running the code produces different output on my machine:
$ nvcc -std=c++11 curand.cu && ./a.out && ./a.out && ./a.out
12345 is: 0.8059
12345 is: 0.53454
12345 is: 0.382981
As I said, I would expect three times the same output in this example.
Upvotes: 0
Views: 2021
Reputation: 152
curand_uniform
does deterministically depend on the state it is provided.Thanks to the comments by Robert Crovella I see now that the error was in relying on the thread execution order. Just not reusing the state would result in the same "random" numbers, when the draw_numbers kernel is called multiple times, which is not an option for me either.
My guess is that the best solution in my case is to only launch 1024 threads (as many as curandState are set up) and generating multiple random numbers in each thread (in my example 16/thread). This way I receive different random numbers on multiple calls within the program, but the same numbers for every program launch.
Updated code:
#include <iostream>
#include <cuda.h>
#include <curand_kernel.h>
__global__ void setup_kernel(curandState *state)
{
auto id = threadIdx.x + blockIdx.x * blockDim.x;
curand_init(123456, id, 0, &state[id]);
}
__global__ void draw_numbers(curandState *state, float* results, int runs)
{
auto id = threadIdx.x + blockIdx.x * blockDim.x;
// Copy state
curandState localState = state[id];
// Generate random numbers
for (int i = 0; i < runs; ++i)
{
results[id + i * 1024] = curand_uniform(&localState);
}
// Copy back state
state[id] = localState;
}
int main(int argc, char* argv[])
{
// Setup
curandState* dStates;
cudaMalloc((void **) &dStates, sizeof(curandState) * 1024);
setup_kernel<<<1024, 1>>>(dStates);
// Random numbers
float* devResults;
cudaMalloc((void **) &devResults, sizeof(float) * 16 * 1024);
float *hostResults = (float*) calloc(16 * 1024, sizeof(float));
// Call draw random numbers
draw_numbers<<<16, 64>>>(dStates, devResults, 16);
// Copy results
cudaMemcpy(hostResults, devResults, 16 * 1024 * sizeof(float), cudaMemcpyDeviceToHost);
// Output number 12345
::std::cout << "12345 is " << hostResults[12345];
// Call draw random numbers (again)
draw_numbers<<<16, 64>>>(dStates, devResults, 16);
// Copy results
cudaMemcpy(hostResults, devResults, 16 * 1024 * sizeof(float), cudaMemcpyDeviceToHost);
// Output number 12345 again
::std::cout << " and " << hostResults[12345] << ::std::endl;
return 0;
}
Producing following output:
$ nvcc -std=c++11 curand.cu && ./a.out && ./a.out && ./a.out
12345 is 0.164181 and 0.295907
12345 is 0.164181 and 0.295907
12345 is 0.164181 and 0.295907
which serves exactly my use-case.
Upvotes: 2