Mourad el Maouchi
Mourad el Maouchi

Reputation: 11

CUDA Coordinates of 3D image

I have a 3D-image with dimensions 512*512*512. I have to process all the voxels individually. However, I can't get the right dimensions to get the x, y and z-values to get the pixel.

In my kernel I have:

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int z = blockIdx.z * blockDim.z + threadIdx.z;

I am running the program by using:

Kernel<<<dim3(8,8), dim3(8,8,16)>>>();

I chose those because having 64 blocks with each 1024 threads should give me every pixel. However, how do I get the coordinate values when I have those dimensions...

When calling the kernel function I have to set some dimensions that the x, y and z-values actually go from 0 to 511. (This gives me the position of every pixel then). But every combination I try, my kernel either does not run or it runs but the values don't get high enough.

The program should make it possible so that every kernel gets a pixel with (x,y,z) that correspond to that pixel in the image. In most simple way I am trying just to print the coordinates to see if it prints all of them.

Any help?

EDIT:

My properties of my GPU:

Compute capability: 2.0
Name: GeForce GTX 480

My program code just to test it out:

#include <stdio.h>
#include <cuda.h>
#include <stdlib.h>

// Device code
__global__ void Kernel()
{
    // Here I should somehow get the x, y and z values for every pixel possible in the 512*512*512 image
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int z = blockIdx.z * blockDim.z + threadIdx.z;

    printf("Coords: (%i, %i, %i)\n", x, y, z);
}

// Host code
int main(int argc, char** argv) {

    Kernel<<<dim3(8, 8), dim3(8,8,16)>>>(); //This invokes the kernel
    cudaDeviceSynchronize();

    return 0;
}

Upvotes: 1

Views: 1004

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 152113

To cover a 512x512x512 space with the indexing you have shown (i.e. one thread per voxel) you would need a kernel launch something like this:

Kernel<<<dim3(64,64,64), dim3(8,8,8)>>>();

When I multiply any of the dimensional components:

64*8

I get 512. This gives me a grid of 512 threads in each of 3 dimensions. Your indexing will work with this arrangement as-is to produce one unique thread per voxel.

The above assumes a cc2.0 or higher device (your mention of 1024 threads per block suggests that you have a cc2.0+ device), which permits 3D grids. If you have a cc1.x device, you will need to modify your indexing.

In that case, you might want something like this:

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = (blockIdx.y%64) * blockDim.y + threadIdx.y;
int z = (blockIdx.y/64) * blockDim.z + threadIdx.z;

along with a kernel launch like this:

Kernel<<<dim3(64,4096), dim3(8,8,8)>>>();

Here's a fully worked example (cc2.0), based on the code you have now shown:

$ cat t604.cu
#include <stdio.h>

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

// Device code
__global__ void Kernel()
{
    // Here I should somehow get the x, y and z values for every pixel possible in the 512*512*512 image
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int z = blockIdx.z * blockDim.z + threadIdx.z;

    if ((x==511)&&(y==511)&&(z==511)) printf("Coords: (%i, %i, %i)\n", x, y, z);
}

// Host code
int main(int argc, char** argv) {
    cudaFree(0);
    cudaCheckErrors("CUDA is not working correctly");
    Kernel<<<dim3(64, 64, 64), dim3(8,8,8)>>>(); //This invokes the kernel
    cudaDeviceSynchronize();
    cudaCheckErrors("kernel fail");

    return 0;
}
$ nvcc -arch=sm_20 -o t604 t604.cu
$ cuda-memcheck ./t604
========= CUDA-MEMCHECK
Coords: (511, 511, 511)
========= ERROR SUMMARY: 0 errors
$

Note that I have elected to only print out one line. I did not want to wade through 512x512x512 lines of printout, it would take a very long time to run, and in-kernel printf is limited in output volume anyway.

Upvotes: 4

Related Questions