Reputation: 477
I am trying to measure my code performance (which is the execution of an OpenCL kernel) and I really need to understand the speed-up. I tried to use clock() and clock_gettime() functions.
In first case my code is simple and straightforward, and it is measured correctly:
struct timespec start_r, start_m, stop_r, stop_m;
double realtime, monotonic;
time_t start2 = clock();
if(clock_gettime(CLOCK_REALTIME, &start_r) == -1) {
cout << "clock realtime error!" << endl;
}
if(clock_gettime(CLOCK_MONOTONIC, &start_m) == -1) {
cout << "clock realtime error!" << endl;
}
double res = 0.0;
for(unsigned long i = 0; i < total; i++) {
res += data[i];
}
cout << "res = " << res << endl;
time_t end2 = clock();
if(clock_gettime(CLOCK_REALTIME, &stop_r) == -1) {
cout << "clock realtime error!" << endl;
}
if(clock_gettime(CLOCK_MONOTONIC, &stop_m) == -1) {
cout << "clock realtime error!" << endl;
}
cout << "Time clock() = " << (end2 - start2)/(double)CLOCKS_PER_SEC << endl;
realtime = (stop_r.tv_sec - start_r.tv_sec) + (double)(stop_r.tv_nsec - start_r.tv_nsec) / (double)BILLION;
monotonic = (stop_m.tv_sec - start_m.tv_sec) + (double)(stop_m.tv_nsec - start_m.tv_nsec) / (double)BILLION;
cout << "Realtime = " << realtime << endl << "Monotonic = " << monotonic << endl;
It gives understandable results - all three results are nearly the same.
When it comes to measuring the execution time of an OpenCL kernel, I do exactly the same, but The results I get are awful:
Time = 0.04
Realtime = 0.26113
Monotonic = 0.26113
Can you give me any idea of what is wrong with it? If it is a usual problem of measuring OpenCL kernel performance, can you suggest the best way to measure it? Thanks!
Upvotes: 1
Views: 7496
Reputation: 129314
The function clock
will in some systems measure the CPU time used by the application. If your application uses OpenCL, it will probably spend most of its time waiting for the actual calculation to be performed by the graphics card, so clock
will not give you "the real time it took to get the result". It's similar to using clock
when reading data from a file, for example - the time it takes to read 100MB from a file is perhaps 2 seconds. But only 0.01s of CPU time is needed to send the commands to the hard disk and collect the data back when it has been stored in memory by the hard disk controller. So the clock
gives "0.01s", not "2s".
Upvotes: 2
Reputation: 7766
If you have access to a C++11 compiler consider using std::chrono instead: http://en.cppreference.com/w/cpp/chrono
There are three clock types built into the new C++ standard:
Furthermore the library is well-designed to handle different levels of granularity, whether you want microsecond accuracy, or something else. For software I have written in the past (large industrial engineering-type simulations) I've relied on std::steady_clock for doing all of my timings with no complaints :-).
Upvotes: 2