Reputation: 43
I have a simple program with 3 array, that count how much the third array is 0 and the first and second has same values. when it's true increment another array index. The problems are:
If kernel has only the first if() then
function the array A is ever 0
If I insert if() then else
function the values of array A is set to 0 after index = 2 and don't count the state when A,B,C=0
this is the code
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <stdlib.h>
#include <cuda_runtime_api.h>
// Kernel that executes on the CUDA device
__global__ void square_array(float *a, float *b, float *c, float *res)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (a[idx]=b[idx] && c[idx]==0) {
res[0]++;
}
else if (a[idx]=b[idx] && c[idx]==1){
res[1]++;
}
}
// main routine that executes on the host
int main(void)
{
float *a_h, *a_d; // Pointer to host & device arrays
float *b_h, *b_d; // Pointer to host & device arrays
float *c_h, *c_d; // Pointer to host & device arrays
float *res_h, *res_d; // Pointer to host & device arrays
const int N = 10; // Number of elements in arrays
size_t size = N * sizeof(float);
//size_t size_s = 4 * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
b_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &b_d, size); // Allocate array on device
c_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &c_d, size); // Allocate array on device
res_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &res_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
// for (int i=0; i<N; i++) a_h[i] = (float)i;
for (int i=0; i<N; i++) a_h[i] = (float)i;
for (int i=0; i<N; i++) b_h[i] = (float)i;
for (int i=0; i<N; i++) c_h[i] = (float)i;
for (int i=0; i<4; i++) res_h[i] = 0;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b_h, size, cudaMemcpyHostToDevice);
cudaMemcpy(c_d, c_h, size, cudaMemcpyHostToDevice);
cudaMemcpy(res_d, res_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 8;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, b_d, c_d, res_d);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
cudaMemcpy(b_h, b_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
cudaMemcpy(c_h, c_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
cudaMemcpy(res_h, res_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++){
printf("%f A \n", a_h[i]);
}
for (int i=0; i<N; i++){
printf("%f B \n", b_h[i]);
}
for (int i=0; i<N; i++){
printf("%f C \n", c_h[i]);
}
for (int i=0; i<4; i++){
printf("%f res \n", res_h[i]);
}
// Cleanup
free(a_h); cudaFree(a_d);
free(b_h); cudaFree(b_d);
free(c_h); cudaFree(c_d);
free(res_h); cudaFree(res_d);
}
Upvotes: 0
Views: 675
Reputation: 9489
Aside from the =
in if (a[idx]=b[idx] && c[idx]==0) {
that should be ==
as you already found (and same goes for the following if
statement), there are at least two other issues in your code:
You don't check that the thread index doesn't go over the limit of the arrays. So since you are using 2 block of 8 threads, you have 16 threads accessing 10 elements arrays. To avoid the issue, you need to pass N as parameter for your kernel and add a if ( idx < N )
somewhere.
You accumulate in res
in parallel without any sort of protection, leading to all kinds of race conditions. This is a very typical histogram issue that is explained aplenty in the literature (web, books, CUDA examples...). A quick fix for you (albeit probably not the most effective one) would be to use atomic operations, such as atomicAdd
. In you case, the line res[0]++;
would become atomicAdd( &res[0], 1 );
, and res[1]++;
would become (as you guessed) atomicAdd( &res[1], 1 );
. The support of this for float implies you compile your code while using compute capability at least 2.0.
HTH
Upvotes: 1
Reputation: 43
Sorry, I solved the problem.It was a mistake typing control = and not true ==
Upvotes: 0