Reputation: 71
i want to know whether, with the -Ofast flag on gcc, the code
x += (a * b) + (c * d) + (e * f);
is faster/slower/the same as/than this code:
x += a * b;
x += b * c;
x += e * f;
I have a math expression like this inside of a nested loop so any gain in speed might have a significant effect.
Upvotes: 1
Views: 171
Reputation: 373382
Intuitively, I'd expect these to compile to the same code. But let's see what actually happens! Using godbolt with your first version (the one-liner), we get this code:
mov eax, DWORD PTR [rsp+20]
mov esi, DWORD PTR [rsp+28]
imul esi, DWORD PTR [rsp+32]
imul eax, DWORD PTR [rsp+24]
lea eax, [rax+rsi]
mov esi, DWORD PTR [rsp+36]
imul esi, DWORD PTR [rsp+40]
add esi, eax
add esi, DWORD PTR [rsp+44]
mov DWORD PTR [rsp+44], esi
With the second version, we get this:
mov esi, DWORD PTR [rsp+28]
imul esi, DWORD PTR [rsp+32]
mov eax, DWORD PTR [rsp+20]
imul eax, DWORD PTR [rsp+24]
add eax, DWORD PTR [rsp+44]
lea eax, [rax+rsi]
mov esi, DWORD PTR [rsp+36]
imul esi, DWORD PTR [rsp+40]
add esi, eax
mov DWORD PTR [rsp+44], esi]
These are, I believe, the same instructions in a slightly different order. I suspect the performance would be almost identical in these two cases, though perhaps (?) there would be a slight difference in pipeline performance with one versus the other.
I suspect that your first version is perfectly fine here.
Upvotes: 1