Muhammad Umer Asif
Muhammad Umer Asif

Reputation: 255

Calculating clock cycles in WINAPI gives very different results

My code for calculating clock cycles for creating thread is

# include <Windows.h>
# include <stdio.h>
# include <conio.h>

# define RUN 20000

DWORD WINAPI Calc(LPVOID Param){}

int main(int argc, char* agrv[])
{
    ULONG64 Sum = 0;

    for (int i = 0; i < RUN; i++)
    {
        ULONG64 ret = 0;
        DWORD ThreadId;
        HANDLE ThreadHandle;

        /* create the thread */
        ThreadHandle = CreateThread(
            NULL,           /* default security attributes */
            0,              /* default stack size */
            Calc,           /* thread function */
            NULL,           /* parameter to thread function */
            0,              /* default creation flags */
            &ThreadId);     /* returns the thread identifier */

        QueryThreadCycleTime(ThreadHandle, &ret);

        WaitForSingleObject(ThreadHandle, INFINITE);

        CloseHandle(ThreadHandle);

        Sum += ret;
    }

    printf_s("The Average no of cycles in %d runs is %lu\n", RUN, (DWORD)(Sum/RUN));

    _getch();
    return 0;
}

The results for this code is round about 1000 clock cycles on my modest laptop. But if I call the QueryThreadCycleTime function after the WaitForSingleObject function the result is very different and in the order of 200,000. I looked around a lot but didn't really found an explanation. What is the reason for such behavior?

Upvotes: 0

Views: 359

Answers (1)

David Heffernan
David Heffernan

Reputation: 613382

The difference is whether or not you wait for the thread to complete execution. Clearly if you wait for that, then doing so will allow the thread to use more clock cycles.

Note that you are not timing the process of creating the thread. You are timing execution of the thread procedure. Let me be clear, the value returned by QueryThreadCycleTime is the number of cycles consumed executing the thread, and not the number of cycles spent executing CreateThread, or indeed the elapsed wall clock time from calling CreateThread to the thread starting execution.

And you are doing so at an indeterminate point. For instance, in your code, QueryThreadCycleTime sometimes returns 0 because the thread has not even started executing at the point where the main thread calls QueryThreadCycleTime.

If you want to time thread creation then time how long it takes for CreateThread to return. Or even better, measure the wall clock time that elapses between the call to CreateThread being made, and the thread starting execution.

For instance, the code might look like this:

# include <Windows.h>
# include <stdio.h>
# include <conio.h>

# define RUN 100000

DWORD WINAPI Calc(LPVOID Param){
    QueryPerformanceCounter((LARGE_INTEGER*)Param);
}

int main(int argc, char* agrv[])
{
    ULONG64 Sum = 0;

    for (int i = 0; i < RUN; i++)
    {
        LARGE_INTEGER PerformanceCountBeforeCreateThread, PerformanceCountWhenThreadStartsExecuting;
        DWORD ThreadId;
        HANDLE ThreadHandle;

        /* create the thread */
        QueryPerformanceCounter(&PerformanceCountBeforeCreateThread);
        ThreadHandle = CreateThread(NULL, 0, Calc,
            &PerformanceCountWhenThreadStartsExecuting, 0, &ThreadId);     
        WaitForSingleObject(ThreadHandle, INFINITE);
        CloseHandle(ThreadHandle);

        Sum += PerformanceCountWhenThreadStartsExecuting.QuadPart - PerformanceCountBeforeCreateThread.QuadPart;
    }

    printf_s("The Average no of counts in %d runs is %lu\n", RUN, (DWORD)(Sum/RUN));

    LARGE_INTEGER Frequency;
    QueryPerformanceFrequency(&Frequency);
    printf_s("Frequency %lu\n", Frequency.QuadPart);

    _getch();
    return 0;
}

Upvotes: 2

Related Questions