Timing Kernel Execution with CPU Timers

Question

I've been trying to measure a cuda kernel execution time using the cpu metric as shown in the nvidia website, however I'm having an issue using the myCPUTimer() function

  T1=myCPUtimer();
  vectorAdd<<>>(d_A, d_B, d_C, numElements);
  cudaDeviceSynchronize();
  T2=myCPUTimer();

after compiling i get this error undefined reference to 'myCPUTimer' and I can't seem to find any documentation online on how to use this function.

Robert Crovella · Accepted Answer

I guess you are referring to this.

The text there states:

the generic host time-stamp function myCPUTimer()

That function is not provided for you, and you can't use it as-is. The "generic" there means it is some function you will use and provide, that is perhaps platform (i.e. OS) specific.

You have to provide a function like that yourself. It is an imaginary function in that context. It doesn't exist in the real world, exactly like that.

You can find many questions here on SO about how to do host timing of CUDA kernels, such as this one.

On linux, for example, you could do something like this:

#include 
#include 
#define USECPSEC 1000000ULL

unsigned long long myCPUTimer(unsigned long long start=0){

  timeval tv;
  gettimeofday(&tv, 0);
  return ((tv.tv_sec*USECPSEC)+tv.tv_usec)-start;
}

Which would return a "timestamp" in microseconds, into an unsigned long long variable, using a fairly high resolution CPU based timer.

Timing Kernel Execution with CPU Timers

Answers (1)

Related Questions