user8893205
user8893205

Reputation:

Would it be faster / using less memory to return in each case of if else?

Recently I got a code example (see code #1) in my CS study and I try to elicit if my second version of the code would be faster or using less memory. Do you have an answer for my question?

// Code #1
double f(double b, double x)
{
    double s;
    if (x == 0.0)
        s = 1.0;
    else
        s = sin(x)/x;
    return s + b;
}

// Code #2
double f(double b, double x)
{
    // I thought this would be faster and using less memory due to 
    // not declaring a new double
    if (x == 0.0)
        return 1.0 + b;
    else
        return sin(x)/x + b;
}

Thank you guys for your help.

Upvotes: 1

Views: 220

Answers (4)

msc
msc

Reputation: 34608

I have generated assembly of both cases.

Case 1:

        push    rbp
        mov     rbp, rsp
        sub     rsp, 48
        movsd   QWORD PTR [rbp-24], xmm0
        movsd   QWORD PTR [rbp-32], xmm1
        pxor    xmm0, xmm0
        ucomisd xmm0, QWORD PTR [rbp-32]
        jp      .L2
        pxor    xmm0, xmm0
        ucomisd xmm0, QWORD PTR [rbp-32]
        jne     .L2
        movsd   xmm0, QWORD PTR .LC1[rip]
        movsd   QWORD PTR [rbp-8], xmm0
        jmp     .L4
.L2:
        mov     rax, QWORD PTR [rbp-32]
        mov     QWORD PTR [rbp-40], rax
        movsd   xmm0, QWORD PTR [rbp-40]
        call    sin
        divsd   xmm0, QWORD PTR [rbp-32]
        movsd   QWORD PTR [rbp-8], xmm0
.L4:
        movsd   xmm0, QWORD PTR [rbp-8]
        addsd   xmm0, QWORD PTR [rbp-24]
        leave
        ret
.LC1:
        .long   0
        .long   1072693248

Case 2:

        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        movsd   QWORD PTR [rbp-8], xmm0
        movsd   QWORD PTR [rbp-16], xmm1
        pxor    xmm0, xmm0
        ucomisd xmm0, QWORD PTR [rbp-16]
        jp      .L2
        pxor    xmm0, xmm0
        ucomisd xmm0, QWORD PTR [rbp-16]
        jne     .L2
        movsd   xmm1, QWORD PTR [rbp-8]
        movsd   xmm0, QWORD PTR .LC1[rip]
        addsd   xmm0, xmm1
        jmp     .L4
.L2:
        mov     rax, QWORD PTR [rbp-16]
        mov     QWORD PTR [rbp-24], rax
        movsd   xmm0, QWORD PTR [rbp-24]
        call    sin
        divsd   xmm0, QWORD PTR [rbp-16]
        addsd   xmm0, QWORD PTR [rbp-8]
.L4:
        leave
        ret
.LC1:
        .long   0
        .long   1072693248

There is no difference.So, there is no speed optimization between them. So, code optimization depends on compiler.

Upvotes: 1

rustyx
rustyx

Reputation: 85371

Short answer: don't worry about it!

Long answer:

sin will take by far the most time in this function, so a couple of additional instructions, if any, will not have any noticeable effect.

Though when in doubt, look at the generated code.

With GCC 6.3 on x86_64, the first version uses 1 more register (xmm2) but the optimizer is able to reorder instructions better.

Version 1:

        ucomisd xmm1, QWORD PTR .LC1[rip]
        movapd  xmm2, xmm0
        jp      .L5
        movsd   xmm0, QWORD PTR .LC0[rip]
        je      .L7
.L5:
        movapd  xmm0, xmm1
        sub     rsp, 24
        movsd   QWORD PTR [rsp+8], xmm2
        movsd   QWORD PTR [rsp], xmm1
        call    sin
        movsd   xmm1, QWORD PTR [rsp]
        movsd   xmm2, QWORD PTR [rsp+8]
        add     rsp, 24
        divsd   xmm0, xmm1
        addsd   xmm0, xmm2
        ret
.L7:
        addsd   xmm0, xmm2
        ret

Version 2:

        ucomisd xmm1, QWORD PTR .LC0[rip]
        jp      .L2
        je      .L10
.L2:
        sub     rsp, 24
        movsd   QWORD PTR [rsp], xmm0
        movapd  xmm0, xmm1
        movsd   QWORD PTR [rsp+8], xmm1
        call    sin
        movsd   xmm1, QWORD PTR [rsp+8]
        divsd   xmm0, xmm1
        addsd   xmm0, QWORD PTR [rsp]
        add     rsp, 24
        ret
.L10:
        addsd   xmm0, QWORD PTR .LC1[rip]
        ret

So how much is the difference in performance between these two versions? Only a performance test can tell for sure (but my guess is you won't see any difference).

Upvotes: 1

Sunil Kanzar
Sunil Kanzar

Reputation: 1260

Only declaring variable is not only thing that occupy memory.

int a = 1 + 2;
int b = 3;
int c = a + b;

This code will take same value as bellow code

int c = 1 + 2 + 3;

Because at the end processor will take one operation at a time in single core.
Second code add two number and hold it in stack and then take third number to add in result of first two number.

Upvotes: 1

ivand58
ivand58

Reputation: 811

It depends on the compiler and optimization flags. In general both codes will give the same outcome.

Upvotes: 1

Related Questions