ZviBar
ZviBar

Reputation: 10068

cudaFree is not freeing memory

The code below calculates the dot product of two vectors a and b. The correct result is 8192. When I run it for the first time the result is correct. Then when I run it for the second time the result is the previous result + 8192 and so on:

1st iteration: result = 8192
2nd iteration: result = 8192 + 8192
3rd iteration: result = 8192 + 8192 
and so on.

I checked by printing it on screen and the device variable dev_c is not freed. What's more writing to it causes something like a sum, the result beeing the previous value plus the new one being written to it. I guess that could be something with the atomicAdd() operation, but nonetheless cudaFree(dev_c) should erase it after all.

#define N 8192
#define THREADS_PER_BLOCK 512
#define NUMBER_OF_BLOCKS (N/THREADS_PER_BLOCK)
#include <stdio.h>


__global__ void dot( int *a, int *b, int *c ) {

    __shared__ int temp[THREADS_PER_BLOCK];

    int index = threadIdx.x + blockIdx.x * blockDim.x;

    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ) {

        int sum = 0;
        for( int i= 0; i< THREADS_PER_BLOCK; i++ ){
        sum += temp[i];
        }
        atomicAdd(c,sum);
    }
}

    int main( void ) {

        int *a, *b, *c;
        int *dev_a, *dev_b, *dev_c; 
        int size = N * sizeof( int); 

        cudaMalloc( (void**)&dev_a, size );
        cudaMalloc( (void**)&dev_b, size );
        cudaMalloc( (void**)&dev_c, sizeof(int));

        a = (int*)malloc(size);
        b = (int*)malloc(size);
        c = (int*)malloc(sizeof(int));

        for(int i = 0 ; i < N ; i++){
            a[i] = 1;
            b[i] = 1;
        }

        cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice);
        cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice);

        dot<<< N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>( dev_a, dev_b, dev_c);

        cudaMemcpy( c, dev_c, sizeof(int) , cudaMemcpyDeviceToHost);

        printf("Dot product = %d\n", *c);

        cudaFree(dev_a);
        cudaFree(dev_b);
        cudaFree(dev_c);    

        free(a); 
        free(b); 
        free(c);

        return 0;

    }

Upvotes: 4

Views: 12964

Answers (2)

Joachim Isaksson
Joachim Isaksson

Reputation: 180867

From the documentation of cudaMalloc();

The memory is not cleared.

That means that dev_c is not initialized, and your atomicAdd(c,sum); will add to any random value that happens to be stored in memory at the returned position.

Upvotes: 6

Robert Crovella
Robert Crovella

Reputation: 151799

cudaFree doesn't erase anything, it simply returns memory to a pool to be re-allocated. cudaMalloc doesn't guarantee the value of memory that has been allocated. You need to initialize memory (both global and shared) that your program uses, in order to have consistent results. The same is true for malloc and free, by the way.

Upvotes: 12

Related Questions