Reputation: 610
I need some clarification on timer resolution. I'm trying to learn profiling in openCL. I have reduction algorithm implemented in OpenCL and want to measure the execution kernel time by getting the total elapsed time in the code given below. I ran this code on different devices and here are the results:
On CPU -- AMD FX 770K Total time = 352,855,601 CL_DEVICE_PROFILING_TIMER_RESOLUTION = 69 ns
On GPU -- AMD Radeon R7 240 Total time = 172,297 CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1 ns
On another GPU -- GeForce GT 610 Total time = 1,725,504 CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1000 ns
The "Total time" given above is in actual nanoseconds? or I need to divide them by the time resolution to get the actual execution time? How the timer resolution can help us?
Here is a part of the code:
/* Enqueue kernel */
err = clEnqueueNDRangeKernel(queue, kernel[i], 1, NULL, &global_size,
&local_size, 0, NULL, &prof_event);
if (err < 0) {
perror("Couldn't enqueue the kernel");
exit(1);
}
/* Finish processing the queue and get profiling information */
clFinish(queue);
clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_START,
sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,
sizeof(time_end), &time_end, NULL);
total_time = time_end - time_start;
printf("Total time = %lu\n\n", total_time);
Upvotes: 1
Views: 762
Reputation: 6343
The specification is pretty clear on this: "current device time counter in nanoseconds"
The times are always in nanoseconds. The resolution query is so you can find out how accurate the data is. For example, given the measurements and resolutions you posted, you can deduce the the error margin of the measure:
AMD FX 770K:
AMD Radeon R7 240:
GeForce GT 610:
Upvotes: 4