Reputation: 276
Right now, I'm trying to determine a method to measure the time that a particular function will take (something like pthread_create). Now, of course, these types of functions are extremely optimized to take as little time as possible; so little, in fact, that my timer that uses gettimeofday in userspace which measures in microseconds is unable to adequately measure anything.
Normally, if I could mess with the kernel, I'd use something like get_cycles to measure the raw number of cycles as a performance metric. However, I haven't found a way to do this in userspace. Is there a way to use get_cycles (or an equivalent) or some other higher precision timer I could use in userspace to measure extremely fast functions?
Upvotes: 4
Views: 11191
Reputation: 31
If get_cycles() works (returns a valid count) on your platform, you may be able use it's implementation in user land. For example, in arm64, this call maps to reading from the CNTVCT_EL0 ARM counter which is an ARM64 system register. Reference: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/arch_timer.h#L200 + freely available ARM Developer documentation on Generic Timer. Since this is an 'EL0' (i.e. lowest privilege level on arm64) register, it's accessible from user land, and so the assembly code in referenced link should work as is.
If you know the the frequency of the oscillator (XO) that drives this counter you can convert the count value you get to time (to, say, in ms) (e.g. if XO is 50MHz, each count (and so you counter resolution) would be 20ns) or you can read CNTFRQ_EL0 register (if arm64) and get counter frequency.
POSIX APIs like clock_gettime() may give you 'nanosecond-precise' timestamp but it's resolution is generally of the order of system timer interrupt (say 1ms), since it may be returning the last timestamp saved by the kernel on it's timer/scheduler interrupt and not reading any hardware counter directly on each call you make from user land. So you won't be able to measure anything that takes lower time than this resolution. You can check the resolution this API offers using clock_getres().
So it depends on the resolution you are looking for. I lean towards reading from counters maintained by CPU system registers if I'm looking for microsecond or lower granularity (my argument: atomic and low latency reads + high resolution), otherwise you have clock_gettime(). Also like Matt Ellen suggested in one of the answers above, taking average would average-out the deviation you might see due to your thread getting context switched out between two counter reads/measurements. I'd also increase the scheduling priority of the thread when you are benchmarking.
Upvotes: 0
Reputation: 78993
My linux man page tells me
CONFORMING TO
SVr4, 4.3BSD. POSIX.1-2001 describes gettimeofday() but not settimeofday(). POSIX.1-2008 marks gettimeofday() as obsolete, recomending the use of clock_gettime(2) instead.
Upvotes: 1
Reputation: 11612
Have you tried getting the time it takes to execute your function, say, 10000 times and taking the mean? That would save the bother of finding more accurate timing functions.
Having said that, this answer: Is gettimeofday() guaranteed to be of microsecond resolution? seems to indicate better functions to use than gettimeofday()
.
Upvotes: 2
Reputation: 81492
clock_gettime
allows you to get a nanosecond-precise time from the thread start, process start or epoch.
Upvotes: 4
Reputation: 91320
Use RDTSC (if you're on x86), or clock_gettime
unsigned long long cycleCount() {
asm ("rdtsc");
}
Upvotes: 7