Reputation: 836
I got a problem with generating number through MTGP32 generator. I already worked with XORWOW or MG32k3a, so I proceed the same way. When I enter a kernel, I copy the state in a local variable and then I work on it. Here I try to do the same thing but the generator keep giving the same random numbers while when I work with a pointer, it's all OK. Here's the code with the copy:
__global__ void generate_kernel( curandStateMtgp32 *state,
int n )
{
int id = threadIdx.x + blockIdx.x * blockDim.x;
float x;
curandStateMtgp32 localState = state[blockIdx.x];
/* Generate pseudo-random normal variable */
for(int i = 0; i < n; i++) {
x = curand_normal( &localState );
printf("tid: %d x: %f\n", id, x);
}
}
And here is the output (tid 1 & 2 got same results each time):
tid: 0 x: 0.207837
tid: 1 x: -0.091346
tid: 2 x: 0.294019
tid: 0 x: 2.684819
tid: 1 x: -0.091346
tid: 2 x: 0.294019
tid: 0 x: 1.433268
tid: 1 x: -0.091346
tid: 2 x: 0.294019
While when I'm working with a pointer, the result is correct. Here's the code:
__global__ void generate_kernel( curandStateMtgp32 *state,
int n )
{
int id = threadIdx.x + blockIdx.x * blockDim.x;
float x;
curandStateMtgp32 * localState = &state[blockIdx.x];
/* Generate pseudo-random normal variable */
for(int i = 0; i < n; i++) {
x = curand_normal( localState );
printf("tid: %d x: %f\n", id, x);
}
}
and the results are:
tid: 0 x: 0.207837
tid: 1 x: -0.091346
tid: 2 x: 0.294019
tid: 0 x: 2.684819
tid: 1 x: -1.183960
tid: 2 x: -0.621348
tid: 0 x: 1.433268
tid: 1 x: 0.571323
tid: 2 x: -0.735758
Can someone explain me what I'm doing wrong or if it's a bug from the compiler? I don't understand why when I'm working with a copy of the state, the first thread got different numbers while the others don't.
Thank you.
I can post the whole code if you want to test it from yourself.
I'm working with RED HAT 6.x - GPU K20xm - CUDA 5.5
compilation line: nvcc -arch=sm_35 -lcurand x.cu
Upvotes: 0
Views: 497
Reputation: 354
Read here how kernel Mersenne twister generation works. In particular, "One complete state of an MTGP32 sequence is defined by 351 32-bit integers. Each thread T(m) operates on one of these integers, s(n+m) combining it with s(n+m+1) and a pickup element s(n+m+p), where p <= 95. It stores the new state at position s(n+m+351) in the state array. After thread synchronization, the base index n is advanced by the number of threads that have updated the state." Unlike XORWOW, you can not provide each thread with local copy of the state, all threads work on the state cooperatively.
Upvotes: 3