Reputation: 415
What is the clock measure by clock()
and clock64()
in CUDA ?
According to CUDA documentation the clock is 'per-multiprocessor counter'. According to my understanding this refers to Primary GPU clock (not the shader clock).
But when I measure clock counts and convert it to time values using primary GPU clock frequency, the results I get are twice large as the real values (I measure real values using the kernel execution time from host code using cuda events). This suggests clock()
returns the shader clock frequency instead of the primary GPU clock.
How can I solve this confusion ?
EDIT : I calculated the primary GPU clock frequency by dividing the clock rate I get from cudaGetDeviceProperties by 2. As far as I understand the value given by cudaGetDeviceProperties is the shader clock frequency.
Upvotes: 5
Views: 10745
Reputation: 415
It's true that CUDA documentation says clock()
and clock64()
returns 'per-multiprocessor counter'. But in Fermi architecture what clock()
and clock64()
actually returns is the shader clock counter.
The clockRate
returned by cudaGetDeviceProperties is the shader clock frequency.
So to compute the time, we have to divide the clock count from clock()
or clock64()
by shader clock frequency you get from cudaGetDeviceProperties.
Upvotes: 5