Reputation:
Recently I got a code example (see code #1) in my CS study and I try to elicit if my second version of the code would be faster or using less memory. Do you have an answer for my question?
// Code #1
double f(double b, double x)
{
double s;
if (x == 0.0)
s = 1.0;
else
s = sin(x)/x;
return s + b;
}
// Code #2
double f(double b, double x)
{
// I thought this would be faster and using less memory due to
// not declaring a new double
if (x == 0.0)
return 1.0 + b;
else
return sin(x)/x + b;
}
Thank you guys for your help.
Upvotes: 1
Views: 220
Reputation: 34608
I have generated assembly of both cases.
Case 1:
push rbp
mov rbp, rsp
sub rsp, 48
movsd QWORD PTR [rbp-24], xmm0
movsd QWORD PTR [rbp-32], xmm1
pxor xmm0, xmm0
ucomisd xmm0, QWORD PTR [rbp-32]
jp .L2
pxor xmm0, xmm0
ucomisd xmm0, QWORD PTR [rbp-32]
jne .L2
movsd xmm0, QWORD PTR .LC1[rip]
movsd QWORD PTR [rbp-8], xmm0
jmp .L4
.L2:
mov rax, QWORD PTR [rbp-32]
mov QWORD PTR [rbp-40], rax
movsd xmm0, QWORD PTR [rbp-40]
call sin
divsd xmm0, QWORD PTR [rbp-32]
movsd QWORD PTR [rbp-8], xmm0
.L4:
movsd xmm0, QWORD PTR [rbp-8]
addsd xmm0, QWORD PTR [rbp-24]
leave
ret
.LC1:
.long 0
.long 1072693248
Case 2:
push rbp
mov rbp, rsp
sub rsp, 32
movsd QWORD PTR [rbp-8], xmm0
movsd QWORD PTR [rbp-16], xmm1
pxor xmm0, xmm0
ucomisd xmm0, QWORD PTR [rbp-16]
jp .L2
pxor xmm0, xmm0
ucomisd xmm0, QWORD PTR [rbp-16]
jne .L2
movsd xmm1, QWORD PTR [rbp-8]
movsd xmm0, QWORD PTR .LC1[rip]
addsd xmm0, xmm1
jmp .L4
.L2:
mov rax, QWORD PTR [rbp-16]
mov QWORD PTR [rbp-24], rax
movsd xmm0, QWORD PTR [rbp-24]
call sin
divsd xmm0, QWORD PTR [rbp-16]
addsd xmm0, QWORD PTR [rbp-8]
.L4:
leave
ret
.LC1:
.long 0
.long 1072693248
There is no difference.So, there is no speed optimization between them. So, code optimization depends on compiler.
Upvotes: 1
Reputation: 85371
Short answer: don't worry about it!
Long answer:
sin
will take by far the most time in this function, so a couple of additional instructions, if any, will not have any noticeable effect.
Though when in doubt, look at the generated code.
With GCC 6.3 on x86_64, the first version uses 1 more register (xmm2) but the optimizer is able to reorder instructions better.
Version 1:
ucomisd xmm1, QWORD PTR .LC1[rip]
movapd xmm2, xmm0
jp .L5
movsd xmm0, QWORD PTR .LC0[rip]
je .L7
.L5:
movapd xmm0, xmm1
sub rsp, 24
movsd QWORD PTR [rsp+8], xmm2
movsd QWORD PTR [rsp], xmm1
call sin
movsd xmm1, QWORD PTR [rsp]
movsd xmm2, QWORD PTR [rsp+8]
add rsp, 24
divsd xmm0, xmm1
addsd xmm0, xmm2
ret
.L7:
addsd xmm0, xmm2
ret
Version 2:
ucomisd xmm1, QWORD PTR .LC0[rip]
jp .L2
je .L10
.L2:
sub rsp, 24
movsd QWORD PTR [rsp], xmm0
movapd xmm0, xmm1
movsd QWORD PTR [rsp+8], xmm1
call sin
movsd xmm1, QWORD PTR [rsp+8]
divsd xmm0, xmm1
addsd xmm0, QWORD PTR [rsp]
add rsp, 24
ret
.L10:
addsd xmm0, QWORD PTR .LC1[rip]
ret
So how much is the difference in performance between these two versions? Only a performance test can tell for sure (but my guess is you won't see any difference).
Upvotes: 1
Reputation: 1260
Only declaring variable is not only thing that occupy memory.
int a = 1 + 2;
int b = 3;
int c = a + b;
This code will take same value as bellow code
int c = 1 + 2 + 3;
Because at the end processor will take one operation at a time in single core.
Second code add two number and hold it in stack and then take third number to add in result of first two number.
Upvotes: 1
Reputation: 811
It depends on the compiler and optimization flags. In general both codes will give the same outcome.
Upvotes: 1