mreff555
mreff555

Reputation: 1101

Trouble with basic CUDA program. code or compiler?

I'm taking an online parallel programming course. The homework is done within a virtual machine on their site. My first assignment (below) ran as it should. squaring numbers from 0 to ARRAY_SIZE. When I try to run it on my machine. I get some strange values. I can't find anything wrong with the code. Any suggestions? (output on my machine posted below).

And yes I am aware that my kernel is called cube despite the fact that I am only squaring the number. I just never changed it.

#include <stdio.h>

__global__ void cube( float* d_in, float* d_out ){
int idx = threadIdx.x;
float f = d_in[idx];
d_out[idx] = f*f;
}

int main(){
    const int ARRAY_SIZE = 8;
    const int ARRAY_BYTES = ARRAY_SIZE * sizeof(float);

    // Host memory
    float h_in[ARRAY_SIZE];
    float h_out[ARRAY_SIZE];
    for( int i = 0; i < ARRAY_SIZE; i++ )
        h_in[i] = (float)i;

    // Device memory pointers
    float* d_in;
    float* d_out;

    // Allocate device memory
    cudaMalloc( (void**) &d_in, ARRAY_BYTES );
    cudaMalloc( (void**) &d_out, ARRAY_BYTES );

    // Transfer input to device
    cudaMemcpy( d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice );

    // Launch the kernel
    cube<<<1,ARRAY_SIZE>>>(d_out,d_in);

    // Transfer device to host
    cudaMemcpy( h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost );

    for(int i = 0; i < ARRAY_SIZE; i++)
    printf("%f\n",h_out[i]);



    // Free memory
    cudaFree(d_in);
    cudaFree(d_out);

    return 0;
    }

output posted below

dan@mojo:~/Dropbox/code/gpu_programming$ nvcc -o first first.cu 
dan@mojo:~/Dropbox/code/gpu_programming$ ./first
-0.000000
-nan
-0.000000
-nan
-0.000000
nan
-nan
-nan

Upvotes: 1

Views: 83

Answers (1)

bendervader
bendervader

Reputation: 2660

Switch the order of the parameters when launching the kernel, i.e.

cube<<<1,ARRAY_SIZE>>>(d_in, d_out);

Upvotes: 2

Related Questions