user3340037
user3340037

Reputation: 731

Instruction count type why so many data operations?

I have a program that is basically looping through and doing a TON of adds in each loop.

So like b += .01 is happening probably 100 times in a loop.

So I expect the ratio of compute (adds) vs loads and stores instructions to be very high. However, unexpectedly, the more additions I do, the greater # of memory reads and writes I get.

int b = 0;
int i;
for (i = 0; i < 100000; i++){
b += .01 * (maybe 50 times)?)
}

I'm using the pin tool, and the memory reads and writes go up by a lot. Much faster than the additions. and I'm confused. I thought b was a local variable and as such, wasn't stored in memory but rather just the stack or in a cache. Why is this occurring?

I've looked at the assembly, and I see no usage of lw or sw anywhere.

Upvotes: 1

Views: 100

Answers (1)

tux3
tux3

Reputation: 7330

By default compilers almost always put variables with automatic lifetime (e.g. int b=0;) on the stack.

For example if I compile with GCC this snippet, which is close to what you wrote, but a little bit more correct :

int main()
{
    int b = 0;
    int i;
    for (i = 0; i < 100000; i++) {
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
    }
    return b;
}

I get the following compiled code :

00000000004004b6 <main>:
  4004b6:       55                      push   %rbp
  4004b7:       48 89 e5                mov    %rsp,%rbp
  4004ba:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4004c1:       c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
  4004c8:       eb 2c                   jmp    4004f6 <main+0x40>
  4004ca:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004ce:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004d2:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004d6:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004da:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004de:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004e2:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004e6:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004ea:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004ee:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4004f2:       83 45 f8 01             addl   $0x1,-0x8(%rbp)
  4004f6:       81 7d f8 9f 86 01 00    cmpl   $0x1869f,-0x8(%rbp)
  4004fd:       7e cb                   jle    4004ca <main+0x14>
  4004ff:       8b 45 fc                mov    -0x4(%rbp),%eax
  400502:       5d                      pop    %rbp
  400503:       c3                      retq   
  400504:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40050b:       00 00 00 
  40050e:       66 90                   xchg   %ax,%ax

Note the addl $0x1,-0x4(%rbp) instructions, those are incrementing our variable, the equivalent of b++ in the source. And we can see that it's on the stack (-0x4(%rbp)), thus each of these instructions will counts as a load and a store. This is why you see such a high count of load/stores.

If you don't want your variable to go on the stack, you can enable optimizations and hope that the compiler will do the right thing, or you can pass a hint with the register keyword, like this :

int main()
{
    register int b = 0;
    int i;
    for (i = 0; i < 100000; i++) {
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
        b++;
    }
    return b;
}

And you get the following compiled code :

00000000004004b6 <main>:
  4004b6:       55                      push   %rbp
  4004b7:       48 89 e5                mov    %rsp,%rbp
  4004ba:       53                      push   %rbx
  4004bb:       bb 00 00 00 00          mov    $0x0,%ebx
  4004c0:       c7 45 f4 00 00 00 00    movl   $0x0,-0xc(%rbp)
  4004c7:       eb 22                   jmp    4004eb <main+0x35>
  4004c9:       83 c3 01                add    $0x1,%ebx
  4004cc:       83 c3 01                add    $0x1,%ebx
  4004cf:       83 c3 01                add    $0x1,%ebx
  4004d2:       83 c3 01                add    $0x1,%ebx
  4004d5:       83 c3 01                add    $0x1,%ebx
  4004d8:       83 c3 01                add    $0x1,%ebx
  4004db:       83 c3 01                add    $0x1,%ebx
  4004de:       83 c3 01                add    $0x1,%ebx
  4004e1:       83 c3 01                add    $0x1,%ebx
  4004e4:       83 c3 01                add    $0x1,%ebx
  4004e7:       83 45 f4 01             addl   $0x1,-0xc(%rbp)
  4004eb:       81 7d f4 9f 86 01 00    cmpl   $0x1869f,-0xc(%rbp)
  4004f2:       7e d5                   jle    4004c9 <main+0x13>
  4004f4:       89 d8                   mov    %ebx,%eax
  4004f6:       5b                      pop    %rbx
  4004f7:       5d                      pop    %rbp
  4004f8:       c3                      retq   
  4004f9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Note that the instructions for incrementing are now add $0x1,%ebx, we can see that our variable is indeed stored in a register (here ebx), as requested.

I thought b was a local variable and as such, wasn't stored in memory but rather just the stack or in a cache. Why is this occurring?

Local variables are usually stored in memory (on the stack). But you can change this behavior. With the second snippet I posted, you'll see a much smaller number of memory read/write operations, because b is not stored in main memory anymore but in a register.

Upvotes: 1

Related Questions