kayush206
kayush206

Reputation: 158

cuda kernel is not accessing all the element of an array

I have written a cuda program to do some operation on large array. But when I pass that array to a cuda kernel, then all of its elements are not accessed by threads. Below, there is a simple program explaining my use case:

#include <stdio.h>
#include <stdlib.h>

__global__
void kernel(int n){
        int s = threadIdx.x + blockIdx.x*blockDim.x;
        int t = blockDim.x*gridDim.x;
        for(int i=s;i<n;i+=t){
        printf("%d\n",i);  //printing index of array which is being accessed
        }
}

int main(void){
        int i,n = 10000; //array_size
        int blockSize = 64;
        int numBlocks = (n + blockSize - 1) / blockSize;
        kernel<<<numBlocks, blockSize>>>(n);
        cudaDeviceSynchronize();
}

I've tried with different blockSize = 256, 128, 64, etc, It is not printing all the indices of array. Ideally, it should print any permutation of 0 to n-1, however it is printing lesser(<n) numbers.

If numBlocks and blockSize both are 1, then it is accessing all the element. And if array size is less than 4096, then also it is accessing all the elements.

Upvotes: 3

Views: 977

Answers (2)

Ander Biguri
Ander Biguri

Reputation: 35525

Use better debugging techniques! Your code is properly functional

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

#include <stdlib.h>

__global__
void kernel(int* in, int n){
    int s = threadIdx.x + blockIdx.x*blockDim.x;
    int t = blockDim.x*gridDim.x;
    for (int i = s; i<n; i += t){
        in[i] = 1;  //printing index of array which is being accessed
    }
}

int main(void){
    int i, n = 10000; //array_size
    int blockSize = 64;
    int numBlocks = (n + blockSize - 1) / blockSize;
    int* d_res,*h_res;
    cudaMalloc(&d_res, n*sizeof(int));
    h_res = (int*)malloc(n*sizeof(int));

    kernel << <numBlocks, blockSize >> >(d_res, n);
    cudaDeviceSynchronize();
    cudaMemcpy(h_res, d_res, n*sizeof(int), cudaMemcpyDeviceToHost);

    int sum = 0;
    for (int i = 0; i < n; i++)
        sum += h_res[i];
    printf("%d", sum);
}

Upvotes: 2

sgarizvi
sgarizvi

Reputation: 16796

Actually, all of the values are being printed in the current case. but you may not be able to see all of them due to buffer limit of the output console. Try increasing the output console's buffer size.

Additionally, keep in mind that the printf calls inside the kernel execute out-of-order. Also, there are limitations of the printf buffer on the device which are explained in the documentation.

Upvotes: 3

Related Questions