Jeff Beougher
Jeff Beougher

Reputation: 11

cuda random number not always return 0 and 1

I am trying to generate a set of random number there are only 1 and zero. The code below almost works. When I do the print for loop I notice that some times I have a number that generates that is not 1 or 0. I know I am missing something just not sure what. I think its a memory misplacement.

#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>
#include <math.h>
#include <assert.h>
#define MIN 1
#define MAX (2048*20)

#define MOD 2 // only need one and zero for each random value.
#define THREADS_PER_BLOCK 256

__global__ void setup_kernel(curandState *state, unsigned long seed)
{
  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  curand_init(seed, idx, 0, state+idx);
}

__global__ void generate_kernel(curandState *state,  unsigned int *result){

  int idx = threadIdx.x + blockDim.x*blockIdx.x;
   result[idx] = curand(state+idx) % MOD;
}

int main(){

  curandState *d_state;
  cudaMalloc(&d_state, sizeof(curandState));

  unsigned *d_result, *h_result;
  cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned));
  h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned));

  cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned));

  setup_kernel<<<MAX/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_state,time(NULL));

  generate_kernel<<<MAX/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_state, d_result);

  cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost);  

  printf("Bin:    Count: \n");
  for (int i = MIN; i <= MAX; i++)
    printf("%d    %d\n", i, h_result[i-MIN]);

  free(h_result);
  cudaFree(d_result);

  system("pause");
  return 0;
}

What I am attempting to do is transform a genetic algorithm from this site.

http://www.ai-junkie.com/ga/intro/gat3.html

I thought it would be a good problem to learn CUDA and have some fun at the same time.

The first part is to generate my random array.

Upvotes: 1

Views: 439

Answers (1)

talonmies
talonmies

Reputation: 72349

The problem here is that both your setup_kernel and generate_kernel are not running to completion because of out of bounds memory access. Both kernels are expecting that there will be a generator state for each thread, but you are only allocating a single state on the device. This results in out of bounds memory reads and writes on both kernels. Change this:

curandState *d_state;
cudaMalloc(&d_state, sizeof(curandState));

to something like

curandState *d_state;
cudaMalloc(&d_state, sizeof(curandState) * (MAX-MIN+1));

so that you have one generator state per thread you are running, and things should start working. If you had made any attempt at checking errors from either the runtime API return statuses or using cuda-memcheck, the source of the error would have been immediately apparent.

Upvotes: 4

Related Questions