Fatemeh Pooyan
Fatemeh Pooyan

Reputation: 13

only one thread executes the cuda kernel

I am new to GPU programming and specifically CUDA/C++. I have written a simple code just to use atomicAdd to increase all members of an array by 1.

But the result shows just the first element of the array increased and others stay the same. My code is as follows.

Thanks for any help in advance.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace std;
__global__ void Histcount( int *a)
{
    int i = blockIdx.x*blockDim.x + threadIdx.x;
    {
        atomicAdd(&a[i], 1);
    }
}

int main()
{
    int * hostarray = new int[20];
    int * devarray;
    cudaError_t error;
    error=cudaMalloc(&devarray, sizeof(int) * 20);
    for (int i = 0; i < 20; i++)
    {
        hostarray[i] = i ;
    }
    cudaMemcpy((int *)devarray, (int *)hostarray, sizeof(int) * 20, cudaMemcpyHostToDevice);
    dim3 gs = (1, 1);
    dim3 bs = (20, 1, 1);
    Histcount <<<gs, bs >>>  (devarray);
    cudaMemcpy((int *)hostarray, (int *)devarray, sizeof(int) * 20, cudaMemcpyDeviceToHost);
    for (int i = 0; i < 20; i++)
    {
        cout << hostarray[i]<<endl;

    }
}

Upvotes: 1

Views: 343

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

This is not a valid way to specify dim3 variables:

dim3 gs = (1, 1);
dim3 bs = (20, 1, 1);

In fact, the compiler may be throwing warnings on those lines, and if so you should not ignore those.

You should do either:

dim3 gs = dim3(1, 1);
dim3 bs = dim3(20, 1, 1);

or:

dim3 gs(1, 1);
dim3 bs(20, 1, 1);

The problem with your implementation is that the compiler doesn't know your actual intent with for example:

(20, 1, 1)

By itself as you have it, the compiler (may issue a warning and in fact) evaluates that expression to be 1, which it then assigns as a scalar to your dim3 variable. So you end up with a block size of 1 and a grid size of 1 (which was not your intent), and your code ran only 1 thread overall.

Upvotes: 3

Related Questions