SqrtPi
SqrtPi

Reputation: 234

Measure running time of C++ and CUDA code

I have a function look like this:

int doSomething() {
    <C++ host code>
    <CUDA device code>
    <C++ host code>
    <...>
}

I would like to measure the running time of this function with high precision (at least millisecond) on Linux and on Windows too.

I know how I can measure the running time of a CUDA program with events, and I have found very accurate libraries for measuring the CPU time used by my process, but I want to measure the overall running time. I can't measure the two time differently and add them together because device code and host code can run parallel.

I want to use as few external library as possible, but I am interested in any good solution.

Upvotes: 1

Views: 2432

Answers (2)

user2347380
user2347380

Reputation: 115

For windows:

LARGE_INTEGER perfCntStart, perfCntStop, proc_freq; 
::memset( &proc_freq, 0x00, sizeof(proc_freq) );
::memset( &perfCntStart, 0x00, sizeof(perfCntStart) ); 
::memset( &perfCntStop, 0x00, sizeof(perfCntStop) );
::QueryPerformanceCounter( &perfCntStart ); 
::QueryPerformanceFrequency( &proc_freq );

.. do something

::QueryPerformanceCounter( &perfCntStop ); 
printf( ": %f\n", float( perfCntStop.QuadPart - perfCntStart.QuadPart ) / float(proc_freq.QuadPart) ); }

Upvotes: 0

Robert Crovella
Robert Crovella

Reputation: 151799

According to the sequence you have shown, I would recommend you do the following:

int doSomething() {
  <C++ host code>
  <CUDA device code>
  <C++ host code>
  <...>
  cudaDeviceSynchronize();  // add this
}

and:

<use your preferred CPU high precision measurement start function>
doSomething();
<use your preferred CPU high precision measurement stop function>

The added cudaDeviceSynchronize() call is not necessary if you have some prior implicit synchronization, such as a cudaMemcpy() call after the last kernel in the <CUDA device code> section.

Responding to a question in the comments below, @JackOLantern seems to be suggesting a high-precision CPU timing method with start (tic) and stop (toc) points in the answer here. Also pointed out by talonmies. If you don't like using the results returned by CLOCK_MONOTONIC you might also try specifying CLOCK_REALTIME_HR instead. On a linux box do man clock_gettime for more info.

Upvotes: 2

Related Questions