Reputation: 234
I have a function look like this:
int doSomething() {
<C++ host code>
<CUDA device code>
<C++ host code>
<...>
}
I would like to measure the running time of this function with high precision (at least millisecond) on Linux and on Windows too.
I know how I can measure the running time of a CUDA program with events, and I have found very accurate libraries for measuring the CPU time used by my process, but I want to measure the overall running time. I can't measure the two time differently and add them together because device code and host code can run parallel.
I want to use as few external library as possible, but I am interested in any good solution.
Upvotes: 1
Views: 2432
Reputation: 115
For windows:
LARGE_INTEGER perfCntStart, perfCntStop, proc_freq;
::memset( &proc_freq, 0x00, sizeof(proc_freq) );
::memset( &perfCntStart, 0x00, sizeof(perfCntStart) );
::memset( &perfCntStop, 0x00, sizeof(perfCntStop) );
::QueryPerformanceCounter( &perfCntStart );
::QueryPerformanceFrequency( &proc_freq );
.. do something
::QueryPerformanceCounter( &perfCntStop );
printf( ": %f\n", float( perfCntStop.QuadPart - perfCntStart.QuadPart ) / float(proc_freq.QuadPart) ); }
Upvotes: 0
Reputation: 151799
According to the sequence you have shown, I would recommend you do the following:
int doSomething() {
<C++ host code>
<CUDA device code>
<C++ host code>
<...>
cudaDeviceSynchronize(); // add this
}
and:
<use your preferred CPU high precision measurement start function>
doSomething();
<use your preferred CPU high precision measurement stop function>
The added cudaDeviceSynchronize()
call is not necessary if you have some prior implicit synchronization, such as a cudaMemcpy()
call after the last kernel in the <CUDA device code>
section.
Responding to a question in the comments below, @JackOLantern seems to be suggesting a high-precision CPU timing method with start (tic) and stop (toc) points in the answer here. Also pointed out by talonmies. If you don't like using the results returned by CLOCK_MONOTONIC
you might also try specifying CLOCK_REALTIME_HR
instead. On a linux box do man clock_gettime
for more info.
Upvotes: 2