Dot product in Cuda by example does not work for me

Question

I'm starting to read "Cuda By Example" Book and I've been a problem with the dot example using "shared memory". I copy-paste the example from the book and I set: N = x * 1024; threadsPerBlock = 32; blocksPerGrid = 8. Where I test the "x" values with 2, 3, 4, 5. If I set x = 3, the result is bad, but when I used x = 2,4,5 all is ok. I don't understand where is the problem. The code is:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include 

#define imin(a, b) (a> >(d_a, d_b, d_partial_c);

    cudaMemcpy(partial_c, d_partial_c, blocksPerGrid * sizeof(float),     cudaMemcpyDeviceToHost);

    result = 0;
    for (int i = 0; i < blocksPerGrid; i++)
        result += partial_c[i];

    if (2 * sum_squares((float)(N - 1)) == result)
        printf(":)
");
    else
        printf(":(
");

    cudaFree(d_a);
    cudaFree(d_b);
    cudaFree(d_partial_c);

    free(a);
    free(b);
    free(partial_c);

    getchar();
    return 0;
}

Dot product in Cuda by example does not work for me

Answers (1)

Related Questions