vpostman
vpostman

Reputation: 101

dummy movups generated by gcc

A little curiosity I found; GCC seems to generate the following code when I have a lot of optimization flags on:

00000000004019ae:   test %si,%si
00000000004019b1:   movups %xmm0,%xmm0
00000000004019b4:   je 0x401f40 <main(int, char**)+1904>

Question: what purpose does the second instruction serve? It doesn't look like it /does/ anything; so, is it some optimization to align the program in the instruction cache? Or is it something with out-of-order execution? (I'm compiling with -mtune=native on a Nehalem if that helps :D).

Nothing urgent, of course, just curious.

Upvotes: 4

Views: 491

Answers (2)

PhiS
PhiS

Reputation: 4650

Adding to the hypothesis proposed by Evgeny Kluev, other possibilities (in no particular order) are that (a) it's a compiler optimiser bug, (b) movups is inserted to break a dependency or (c) it is inserted for the purpose of code alignment.

Upvotes: 2

Evgeny Kluev
Evgeny Kluev

Reputation: 24647

Possibly xmm0 contains a result of some calculations, done in integer domain (with integer SSE instruction). And the next instruction using xmm0 is expected to be in floating point domain (floating point SSE instruction).

Nehalem may perform this next instruction faster if xmm0 is migrated to floating point domain with instruction like movaps or movups. And it may be beneficial to perform this migration prior to conditional jump instruction. In this case migration is done only once. If no movups instruction used, migration may be done twice (automatically, by the first FP instruction on this register), first time speculatively, on mispredicted branch, and second time - on the correct branch.

It seems, compiler noticed, that it is better to optimize calculation dependency chains, than to optimize for code size and execution resources.

Upvotes: 6

Related Questions