user2390724
user2390724

Reputation: 15

CUDA profiling inside kernel

Is there any option to profile a CUDA kernel? Not as a whole, but rather part of it. I have some device functions invocation and I want to measure their times. Are there any flags/events/instructions that I can set and then it will be visible in NVIDIA Visual Profiler? Or do I need to do it manually by inserting cudaEventCreate and similar functions.

Upvotes: 1

Views: 1463

Answers (1)

tera
tera

Reputation: 7255

You can time specific parts of your kernel manually using the clock() or clock64() functions:

unsigned long long* time_spent;

__global__ void kernel(...)
{
    unsigned int t1, t2;
    // ...
    t1 = clock();
    // code of interest
    t2 = clock();
    atomicAdd(&time_spent, t2 - t1);
}

'clock()` officially returns a clock_t, but I prefer the explicit use of unsigned int to make obvious how the above code correctly handles wraparound of clock values (as long as the timed code does not take more than 2^32-1 cycles to complete.

Make sure to also time the code with

    t1 = clock();
    t2 = clock();

back-to-back so you can subtract the timing overhead.

Upvotes: 3

Related Questions