hthms
hthms

Reputation: 883

CUDA kernel launch macro with templates

I made a macro to simplify CUDA kernel calls:

#define LAUNCH LAUNCH_ASYNC

#define LAUNCH_ASYNC(kernel_name, gridsize, blocksize, ...) \
    LOG("Async kernel launch: " #kernel_name);              \
    kernel_name <<< (gridsize), (blocksize) >>> (__VA_ARGS__);

#define LAUNCH_SYNC(kernel_name, gridsize, blocksize, ...)     \
    LOG("Sync kernel launch: " #kernel_name);                  \
    kernel_name <<< (gridsize), (blocksize) >>> (__VA_ARGS__); \
    cudaDeviceSynchronize();                                   \
    // error check, etc...

Usage:

LAUNCH(my_kernel, 32, 32, param1, param2)

LAUNCH(my_kernel<int>, 32, 32, param1, param2)

This works fine; with the first define I can enable synronous calls and error checking for debugging.

However it does not work with multiple template arguments like below:

LAUNCH(my_kernel<int,float>, 32, 32, param1, param3)

The error message I get in the line where I call the macro:

error : expected a ">"

Is it possible to make this macro work with multiple template arguments?

Upvotes: 1

Views: 1354

Answers (3)

yourstruly
yourstruly

Reputation: 1002

consider this solution that also throws error

inline void echoError(cudaError_t e, const char *strs) {
    char a[255];
    if (e != cudaSuccess) {
        strncpy(a, strs, 255);
        fprintf(stderr, "Failed to %s,errorCode %s",
                a, cudaGetErrorString(e));
        exit(EXIT_FAILURE);
    }
}


#define CUDA_KERNEL_DYN(kernel, bpg, tpb, shd, ...){                     \
    kernel<<<bpg,tpb,shd>>>( __VA_ARGS__ );                              \
    cudaError_t err = cudaGetLastError();                                \
    echoError(err, #kernel);                                              \
}

Upvotes: 0

Developer Paul
Developer Paul

Reputation: 1510

Something else you could try that I have used (based on the macro you posted) is wrapping the kernel block size and grid size arguments in their own macro:

#define KERNEL_ARGS2(grid, block) <<< grid, block >>>
#define KERNEL_ARGS3(grid, block, sh_mem) <<< grid, block, sh_mem >>>
#define KERNEL_ARGS4(grid, block, sh_mem, stream) <<< grid, block, sh_mem, stream >>>

Now you should be able to use your macro like so:

#define CUDA_LAUNCH(kernel_name, gridsize, blocksize, ...) \
kernel_name KERNEL_ARGS2(gridsize, blocksize)(__VA_ARGS__);

You can use it like:

CUDA_LAUNCH(my_kernel, grid_size, block_size, float* input, float* output, int size);

This will launch the kernel called 'my_kernal' with the given grid and block size and the input arguments.

Upvotes: 1

The problem is that the preprocessor knows nothing about angle bracket nesting, so it interprets the comma between them as macro argument separator.

If the kernel-launch syntax supports parentheses around the kernel name (I can't check now, not on a CUDA machine), you could do this:

LAUNCH((my_kernel<int, float>), 32, 32, param1, param3)

Upvotes: 5

Related Questions