Jeff Bencteux
Jeff Bencteux

Reputation: 1426

Maximum blocks number on a GTX TITAN

I'm trying to compute Fourier transforms using CUDA on a nvidia GTX TITAN graphic card. I have a problem when reaching a certain number of blocks of my card.

Here is what my card tells me when using cudaGetDeviceProperties:

Here is the code I use to call my kernel function:

cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);

unsigned int threads = prop.maxThreadsPerBlock;
unsigned int max_blocks = prop.maxGridSize[0];
unsigned int blocks = (pixel_size + threads - 1) / threads;

// Hardware limit
if (blocks > max_blocks)
  blocks = max_blocks;

kernel_function <<<blocks, threads>>>(pixel_size);

And the kernel code:

__global__ void kernel_function(unsigned int pixel_size)
{
  unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;

  while (index < pixel_size)
  {
    // Treatment here
    index += blockDim.x * gridDim.x;
  }
}

Where pixel_size is the size in pixels of an image block I want to do transforms on.

So threads is always equal to 1024, which is what I want. Whenever blocks are inferior or equals to 65535, then my code works fine. But when blocks reaches a value above 65535, the results I have are a nonsense and totally random. So what is the maximum number of blocks I can have in a one dimension problem ? I assumed in the previous code that it was 2147483647 ? What am I doing wrong ?

I feel like I am using the wrong hardware limit for my number of blocks because when I set it to 65535, this code is working fine.

Thank you in advance for your answers.

Upvotes: 1

Views: 3023

Answers (1)

Jeff Bencteux
Jeff Bencteux

Reputation: 1426

Problem solved, I was compiling with flags for 2.x architecture instead of 3.5 so the 2.x limit was applying (wich is 65535 blocks max on x dimension). After compiling with compute_35, sm_35, it worked.

Thanks @talonmies.

Upvotes: 4

Related Questions