Reputation: 2749
The Windows function QueryThreadCycleTime()
gives the number of "CPU clock cycles" used by a given thread. The Windows manual boldly states
Do not attempt to convert the CPU clock cycles returned by QueryThreadCycleTime to elapsed time.
I would like to do exactly this, for most Intel and AMD x86_64 CPUs.
It doesn't need to be very accurate, because you can't expect perfection from cycle counters like RDTSC anyway.
I just need some kludgey way to get the time factor seconds / QueryThreadCycleTime
for the CPUs.
First, I imagine that QueryThreadCycleTime
uses RDTSC internally.
I imagine that on some CPUs, constant rate TSC is used, so changing the actual clock rate (e.g. with variable-frequency CPU power management) doesn't affect the time/TSC
factor.
On other CPUs, that rate might change, so I'd have to query this factor periodically.
Before anyone cites the XY Problem, I should note that I'm not really interested in alternative solutions. This is because I have two hard requirements for profiling that no other method meets.
sleep(1)
should not return 1 second, but a busy loop lasting 1 second should. In other words, the profiler should not say that a task ran for 10ms when its thread was only active for 1ms. This is the reason I cannot use QueryPerformanceCounter()
.GetThreadTimes()
. The tasks I'm profiling might run for only a few microseconds.As requested by @Ted Lyngmo, the goal is implement computeFactor()
.
#include <stdio.h>
#include <windows.h>
double computeFactor();
int main() {
uint64_t start, end;
QueryThreadCycleTime(GetCurrentThread(), &start);
// insert task here, such as an actual workload or sleep(1)
QueryThreadCycleTime(GetCurrentThread(), &end);
printf("%lf\n", (end - start) * computeFactor());
return 0;
}
Upvotes: 2
Views: 1383
Reputation: 117643
Do not attempt to convert the CPU clock cycles returned by QueryThreadCycleTime to elapsed time.
I would like to do exactly this.
Your wish is obviously Denied!
A workaround, that will do something close to what you want, could be to create one thread with a steady_clock
that samples QueryThreadCycleTime
and/or GetThreadTimes
at some specified frequency. Here's an example of how it could be done with a sampling thread taking a sample of both once every second.
#include <algorithm>
#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <iomanip>
#include <thread>
#include <vector>
#include <Windows.h>
using namespace std::literals::chrono_literals;
struct FTs_t {
FILETIME CreationTime, ExitTime, KernelTime, UserTime;
ULONG64 CycleTime;
};
using Sample = std::vector<FTs_t>;
std::ostream& operator<<(std::ostream& os, const FILETIME& ft) {
std::uint64_t bft = (std::uint64_t(ft.dwHighDateTime) << 16) + ft.dwLowDateTime;
return os << bft;
}
std::ostream& operator<<(std::ostream& os, const Sample& smp) {
size_t tno = 0;
for (const auto& fts : smp) {
os << " tno:" << std::setw(3) << tno << std::setw(10) << fts.KernelTime
<< std::setw(10) << fts.UserTime << std::setw(16) << fts.CycleTime << "\n";
++tno;
}
return os;
}
// the sampling thread
void ft_sampler(std::atomic<bool>& quit, std::vector<std::thread>& threads, std::vector<Sample>& samples) {
auto tp = std::chrono::steady_clock::now(); // for steady sampling
FTs_t fts;
while (quit == false) {
Sample s;
s.reserve(threads.size());
for (auto& th : threads) {
if (QueryThreadCycleTime(th.native_handle(), &fts.CycleTime) &&
GetThreadTimes(th.native_handle(), &fts.CreationTime,
&fts.ExitTime, &fts.KernelTime, &fts.UserTime)) {
s.push_back(fts);
}
}
samples.emplace_back(std::move(s));
tp += 1s; // add a second since we last sampled and sleep until that time_point
std::this_thread::sleep_until(tp);
}
}
// a worker thread
void worker(std::atomic <bool>& quit, size_t payload) {
volatile std::uintmax_t x = 0;
while (quit == false) {
for (size_t i = 0; i < payload; ++i) ++x;
std::this_thread::sleep_for(1us);
}
}
int main() {
std::atomic<bool> quit_sampling = false, quit_working = false;
std::vector<std::thread> threads;
std::vector<Sample> samples;
size_t max_threads = std::thread::hardware_concurrency() > 1 ? std::thread::hardware_concurrency() - 1 : 1;
// start some worker threads
for (size_t tno = 0; tno < max_threads; ++tno) {
threads.emplace_back(std::thread(&worker, std::ref(quit_working), (tno + 100) * 100000));
}
// start the sampling thread
auto smplr = std::thread(&ft_sampler, std::ref(quit_sampling), std::ref(threads), std::ref(samples));
// let the threads work for some time
std::this_thread::sleep_for(10s);
quit_sampling = true;
smplr.join();
quit_working = true;
for (auto& th : threads) th.join();
std::cout << "Took " << samples.size() << " samples\n";
size_t s = 0;
for (const auto& smp : samples) {
std::cout << "Sample " << s << ":\n" << smp << "\n";
++s;
}
}
Upvotes: 1