Yuri  Schneider
Yuri Schneider

Reputation: 61

Cuda blockDim.y always ==1

I always get blockdim.y ==1. No matter what value i set in numBlocks, i always got same.

__global__ void CalcVideo(unsigned char *original, unsigned char *candidate, int *answer)
{
    printf("block id.x = %d blockid.y=%d blockdim.x = %d blockdim.y = %d Thread id= %d \n", 
        blockIdx.x, blockIdx.y, blockDim.x, blockDim.y, threadIdx.x );
}

int ORIGINAL_FRAMES = 3;
int CANDIDATE_FRAMES = 2;
int FRAME_LENGHT = 3;

dim3 numBlocks(ORIGINAL_FRAMES, CANDIDATE_FRAMES);
    dim3 threadsPerBlock(3);  // 64 threads

CalcVideo << <numBlocks, threadsPerBlock >> >(original_device, candidate_device, answer_device);

Num of y.blokcs executes correctly, but why program gives me wrong blockdim.y size?

block id.x = 1 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 1 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 1 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 1 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 1 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 1 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 0 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 0 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 0 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 0 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 0 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 0 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 2 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 2 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 2 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 2 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 2 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 2 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 2

Upvotes: 1

Views: 533

Answers (1)

blockDim stores the dimensions of one block. In your case, you're passing threadsPerBlock as the block dimension, which makes it 3 x 1 x 1. The first argument to kernel invocation, numBlocks in your case, controls the dimension of the grid of blocks—which you can access in the kernel as gridDim.


Side note: I assume the extremely low number and size of blocks in the question are for testing purposes only, as they will leave any GPU extremely underutilised in practice.

Upvotes: 4

Related Questions