Vortico
Vortico

Reputation: 2749

How to convert QueryThreadCycleTime() to seconds?

The Windows function QueryThreadCycleTime() gives the number of "CPU clock cycles" used by a given thread. The Windows manual boldly states

Do not attempt to convert the CPU clock cycles returned by QueryThreadCycleTime to elapsed time.

I would like to do exactly this, for most Intel and AMD x86_64 CPUs. It doesn't need to be very accurate, because you can't expect perfection from cycle counters like RDTSC anyway. I just need some kludgey way to get the time factor seconds / QueryThreadCycleTime for the CPUs.

First, I imagine that QueryThreadCycleTime uses RDTSC internally. I imagine that on some CPUs, constant rate TSC is used, so changing the actual clock rate (e.g. with variable-frequency CPU power management) doesn't affect the time/TSC factor. On other CPUs, that rate might change, so I'd have to query this factor periodically.

Why do I need this?

Before anyone cites the XY Problem, I should note that I'm not really interested in alternative solutions. This is because I have two hard requirements for profiling that no other method meets.

Minimal reproducable example

As requested by @Ted Lyngmo, the goal is implement computeFactor().

#include <stdio.h>
#include <windows.h>

double computeFactor();

int main() {
    uint64_t start, end;
    QueryThreadCycleTime(GetCurrentThread(), &start);
    // insert task here, such as an actual workload or sleep(1)
    QueryThreadCycleTime(GetCurrentThread(), &end);
    printf("%lf\n", (end - start) * computeFactor());
    return 0;
}

Upvotes: 2

Views: 1383

Answers (1)

Ted Lyngmo
Ted Lyngmo

Reputation: 117643

Do not attempt to convert the CPU clock cycles returned by QueryThreadCycleTime to elapsed time.

I would like to do exactly this.

Your wish is obviously Denied!

A workaround, that will do something close to what you want, could be to create one thread with a steady_clock that samples QueryThreadCycleTime and/or GetThreadTimes at some specified frequency. Here's an example of how it could be done with a sampling thread taking a sample of both once every second.

#include <algorithm>
#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <iomanip>
#include <thread>
#include <vector>

#include <Windows.h>

using namespace std::literals::chrono_literals;

struct FTs_t {
    FILETIME CreationTime, ExitTime, KernelTime, UserTime;
    ULONG64 CycleTime;
};

using Sample = std::vector<FTs_t>;

std::ostream& operator<<(std::ostream& os, const FILETIME& ft) {
    std::uint64_t bft = (std::uint64_t(ft.dwHighDateTime) << 16) + ft.dwLowDateTime;
    return os << bft;
}

std::ostream& operator<<(std::ostream& os, const Sample& smp) {
    size_t tno = 0;
    for (const auto& fts : smp) {
        os << " tno:" << std::setw(3) << tno << std::setw(10) << fts.KernelTime
           << std::setw(10) << fts.UserTime << std::setw(16) << fts.CycleTime << "\n";
        ++tno;
    }
    return os;
}

// the sampling thread
void ft_sampler(std::atomic<bool>& quit, std::vector<std::thread>& threads, std::vector<Sample>& samples) {
    auto tp = std::chrono::steady_clock::now(); // for steady sampling

    FTs_t fts;
    while (quit == false) {
        Sample s;
        s.reserve(threads.size());
        for (auto& th : threads) {
            if (QueryThreadCycleTime(th.native_handle(), &fts.CycleTime) &&
                GetThreadTimes(th.native_handle(), &fts.CreationTime,
                               &fts.ExitTime, &fts.KernelTime, &fts.UserTime)) {
                s.push_back(fts);
            }
        }
        samples.emplace_back(std::move(s));

        tp += 1s; // add a second since we last sampled and sleep until that time_point
        std::this_thread::sleep_until(tp);
    }
}

// a worker thread
void worker(std::atomic <bool>& quit, size_t payload) {
    volatile std::uintmax_t x = 0;
    while (quit == false) {
        for (size_t i = 0; i < payload; ++i) ++x;
        std::this_thread::sleep_for(1us);
    }
}

int main() {
    std::atomic<bool> quit_sampling = false, quit_working = false;
    std::vector<std::thread> threads;
    std::vector<Sample> samples;
    size_t max_threads = std::thread::hardware_concurrency() > 1 ? std::thread::hardware_concurrency() - 1 : 1;

    // start some worker threads
    for (size_t tno = 0; tno < max_threads; ++tno) {
        threads.emplace_back(std::thread(&worker, std::ref(quit_working), (tno + 100) * 100000));
    }

    // start the sampling thread
    auto smplr = std::thread(&ft_sampler, std::ref(quit_sampling), std::ref(threads), std::ref(samples));

    // let the threads work for some time
    std::this_thread::sleep_for(10s);

    quit_sampling = true;
    smplr.join();

    quit_working = true;
    for (auto& th : threads) th.join();

    std::cout << "Took " << samples.size() << " samples\n";

    size_t s = 0;
    for (const auto& smp : samples) {
        std::cout << "Sample " << s << ":\n" << smp << "\n";
        ++s;
    }
}

Upvotes: 1

Related Questions