Reputation: 15
Is there any option to profile a CUDA kernel? Not as a whole, but rather part of it. I have some device functions invocation and I want to measure their times. Are there any flags/events/instructions that I can set and then it will be visible in NVIDIA Visual Profiler? Or do I need to do it manually by inserting cudaEventCreate
and similar functions.
Upvotes: 1
Views: 1463
Reputation: 7255
You can time specific parts of your kernel manually using the clock()
or clock64()
functions:
unsigned long long* time_spent;
__global__ void kernel(...)
{
unsigned int t1, t2;
// ...
t1 = clock();
// code of interest
t2 = clock();
atomicAdd(&time_spent, t2 - t1);
}
'clock()` officially returns a clock_t, but I prefer the explicit use of unsigned int to make obvious how the above code correctly handles wraparound of clock values (as long as the timed code does not take more than 2^32-1 cycles to complete.
Make sure to also time the code with
t1 = clock();
t2 = clock();
back-to-back so you can subtract the timing overhead.
Upvotes: 3