Reputation: 543

How to measure the number of increments per second

I want to measure the speed in which my PC can increment a counter N times (e.g., for N = 10^9).

I tried the following code:

using namespace std
auto start = chrono::steady_clock::now();
for (int i = 0; i < N; ++i)
{
}
auto end = chrono::steady_clock::now();

However, the compiler is smart enough to simply set i=N, and I get that start==end regardless of the value of N.

How can I change the code to measure the increment speed? (adding costly operations in the loop would dominate the runtime and would not allow the measurement to be correct).

I use Windows 10 and Visual Studio 15.9.7.

A bit of motivation: my code takes about 2 seconds for N=10^9. I'm wondering if there's any "meat" left in optimizing it further (e.g., could it possibly go down to 1 sec? or would the loop itself require more?)

Upvotes: 0

Answers (3)

Andrew Bainbridge

Reputation: 4808

This question doesn't really make sense in C or C++. The compiler aims to generate the fastest code that meets the constraints defined by your source code. In your question, you do not define a constraint that the compiler must do a loop at all. Because the loop has no effect, the optimizer will remove it.

Gabriel Staple's answer is probably the nearest thing you can get to a sensible answer to your question, but it is also not quite right because it defines too many constraints that limits the compiler's freedom to implement optimal code. Volatile often forces the compiler to write the result back to memory each time the variable is modified.

eg, this code:

void foo(int N) {
    for (volatile int i = 0; i < N; ++i)
    {
    }
}

Becomes this assembly (on an x64 compiler I tried):

        mov     DWORD PTR [rsp-4], 0
        mov     eax, DWORD PTR [rsp-4]
        cmp     edi, eax
        jle     .L1
.L3:
        mov     eax, DWORD PTR [rsp-4] # Read i from mem
        add     eax, 1                 # i++
        mov     DWORD PTR [rsp-4], eax # Write i to mem
        mov     eax, DWORD PTR [rsp-4] # Read it back again before
                                       # evaluating the loop condition.
        cmp     eax, edi               # Is i < N?
        jl      .L3                    # Jump back to L3 if not.
.L1:

It sounds like your real question is more like how fast is:

L1:    add     eax, 1
       jmp     L1

Even the answer to that is complex and requires an understanding of the internals of your CPU's pipelines.

I recommend playing with Godbolt to understand more about what the compiler is doing. eg https://godbolt.org/z/59XUSu

Upvotes: 4

user555045

Reputation: 64913

You can directly measure the speed of the "empty loop", but it is not easy to convince a C++ compiler to emit it. GCC and Clang can be tricked with asm volatile("") but MSVC inline assembly has always been different and is disabled completely for 64bit programs.

It is possible to use MASM to side-step that restriction:

.MODEL FLAT
.CODE

_testfun PROC
    sub ecx, 1
    jnz _testfun
    ret
_testfun ENDP

END

Import it into your code with extern "C" void testfun(unsigned N);.

Upvotes: 1

Gabriel Staples

Reputation: 53115

Try volatile int i = 0 In your for loop. The volatile keyword tells the compiler this variable could change at any time, due to outside events or threads, and therefore it can't make the same assumptions about what the variable might be in the future.

Upvotes: 0

How to measure the number of increments per second

Answers (3)

Related Questions