REALFREE
REALFREE

Reputation: 4396

Understanding optimized assembly code generated by gcc

I'm trying to understand what kind of optimizations are performed by gcc when -O3 flag was set. I'm quite confused what these two lines,

xor %esi, %esi
lea 0x0(%esi), %esi

It seems to me redundant. What's point to use lea instruction here?

Upvotes: 2

Views: 507

Answers (2)

Leeor
Leeor

Reputation: 19706

As explained by ughoavgfhw - these are paddings for better code alignment. You can find this lea in the following link -

http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2010-September/003881.html

quote:

  1-byte: XCHG EAX, EAX
  2-byte: 66 NOP
  3-byte: LEA REG, 0 (REG) (8-bit displacement)
  4-byte: NOP DWORD PTR [EAX + 0] (8-bit displacement)
  5-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (8-bit displacement)
**6-byte: LEA REG, 0 (REG) (32-bit displacement)**
  7-byte: NOP DWORD PTR [EAX + 0] (32-bit displacement)
  8-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
  9-byte: NOP WORD  PTR [EAX + EAX*1 + 0] (32-bit displacement)

Also note this SO question describing it in more details - What does NOPL do in x86 system?

Note that the xor itself is not a nop (it changes the value of the reg), but it is also very cheap to perform since it's a zero idiom - What is the purpose of XORing a register with itself?

Upvotes: 1

ughoavgfhw
ughoavgfhw

Reputation: 39905

That instruction is used to fill space for alignment purposes. Loops can be faster when they start on aligned addresses, because the processor loads memory into the decoder in chunks. By aligning the beginnings of loops and functions, it becomes more likely that they will be at the beginning of one of these chunks. This prevents previous instructions which will not be used from being loaded, maximizes the number of future instructions that will, and, possibly most importantly, ensures that the first instruction is entirely in the first chunk, so it does not take two loads to execute it.

The compiler knows that it is best to align the loop, and has two options to do so. It can either place a jump to the beginning of the loop, or fill the gap with no-ops and let the processor flow through them. Jump instructions break the flow of instructions and often cause wasted cycles on modern processors, so adding them unnecessarily is inadvisable. For a short distance like this no-ops are better.

The x86 architecture contains an instruction specifically for the purpose of doing nothing, nop. However, this is one byte long, so it would take more than one to align the loop. Decoding each one and deciding it does nothing takes time, so it is faster to simply insert another longer instruction that has no side effects. Therefore, the compiler inserted the lea instruction you see. It has absolutely no effects, and is chosen by the compiler to have the exact length required. In fact, recent processors have standard multi-byte no-op instructions, so this will likely be recognized during decode and never even executed.

Upvotes: 4

Related Questions