Jayesh
Jayesh

Reputation: 4893

Which code will execute faster in C?

I saw the following code (Question 3 on PUZZLERSWORLD.COM site):

Code 1 :

for (i = 0; i < 1000; i++)
    for (j = 0; j < 100; j++)
        x = y;

Code 2 :

for (i = 0; i < 100; i++)
    for (j = 0; j < 1000; j++)
        x = y;

Which code will execute faster?

Options:

a) Code 1 and Code 2 are of same speed,
b) Code 1,
c) Code 2,
d) Can't Say

Answer:

c)

So, I have a question, Why is the second code faster than the first?

Upvotes: 1

Views: 1135

Answers (7)

chqrlie
chqrlie

Reputation: 144540

The question is missing important information for any documented answer.

If for example we have these programs:

int main(void) {
    int i, j, x, y;
    for (i = 0; i < 1000; i++)
        for (j = 0; j < 100; j++)
            x = y;
}

and

int main(void) {
    int i, j, x, y;
    for (i = 0; i < 100; i++)
        for (j = 0; j < 1000; j++)
            x = y;
}

Both have undefined behavior because they read the value of uninitialized variable y.

If we have this:

int main(void) {
    int i;
    unsigned char j;
    int x, y = 0;
    for (i = 0; i < 1000; i++)
        for (j = 0; j < 100; j++)
            x = y;
}

and

int main(void) {
    int i;
    unsigned char j;
    int x, y = 0;
    for (i = 0; i < 1000; i++)
        for (j = 0; j < 100; j++)
            x = y;
}

The second code runs forever and the first finishes instantly.

With decent compilers and correct definitions, the code is almost completely optimized out, so answer a) applies:

int main(void) {
    int i, j, x, y = 0;
    for (i = 0; i < 100; i++)
        for (j = 0; j < 1000; j++)
            x = y;
}

compiled with Godbolt's compiler explorer, produces:

main:
        xorl    %eax, %eax
        ret

The site expects you to chose c) because the initial overhead of the nested loop is repeated 10 times more in the second code fragment than in the first. While there is some logic in this analysis, it might not be the dominant factor and as explained above other considerations must be studied too, and in any case 900 extra instructions would be negligible compared to several hundreds of thousands for the rest of the code, making it hardly measurable.

In performance analysis, humility is a cardinal virtue. Use profiling tools and carefully write thorough benchmarking code to determine if optimization is useful at all. It is a common mistake to write complex code in a attempt at optimizing and get code that produces incorrect results.


Note that the linked page is full of broken questions. For example Q2:

#include
fun(int i)
{
    int j=0;
    while (i & i-1) {
        j++;   
    }
    printf("%d",j);
}

main(){
    f(100);
}

Options: a. 100 b. 99 c. 2 d. 3

The code does not compile, and were it corrected, it has an infinite loop so none of the answers are correct. For the curious, the correct function to compute the number of bits in i is:

void fun(unsigned i) {
    int j = 0;
    while (i) {
        i = i & (i - 1);
        j++;   
    }
    printf("%d\n", j);
}

Out of 10 questions, 7 have incorrect answers and the remaining 3 (Q1, Q4 and Q5) are trick questions with syntax errors. A very poor quality Q&A site indeed.

Upvotes: 1

Grzegorz Szpetkowski
Grzegorz Szpetkowski

Reputation: 37904

Unless x or y or both are declared as volatile, both codes can be reduced into:

x = y;

Example:

int f(int y)
{
    int x;

    for (int i = 0; i < 1000; i++)
        for (int j = 0; j < 100; j++)
            x = y;
    return x;
}

yields:

f(int):
        mov     eax, edi
        ret

Upvotes: 13

stefan bachert
stefan bachert

Reputation: 9608

A loop has some overhead

The overhead of the inner loop in code 1 is called 1000 times while in code 2 100 times.

Thatfore code 2 is faster or same

Even if the compiler optimizes very well and removes the loops, code 2 will never be slower

I choose option e)

PROVE

ASSUMPTIONS

  • using the compiler for both codes
  • when using optimisations then for both codes
  • using only rational optimisations, no random or weird ones
  • any code need time to execute

case a)

the compiler does not optimize at all, than the overhead of inner loop applies and code 2 ist faster. That would match option D) or E)

case b)

the compiler does optimization

b1) remove loops at all.

Than both codes result to the same. Options A) or E)

b2) combining both loops to one

Than both codes result to the same. Options A) or E)

b3) combining reordering the loops (this would be a silly sub-optimization in the original code)

Than both codes result to the same. Options A) or E)

So after combining all cases only E) is true


Fabel "counterexample" is not valid since there is no reason why applying an optimisation in code 1 but not in code 2

Upvotes: 1

0___________
0___________

Reputation: 67476

Depending on the type of the variables, as compiler will optimise the code different way.

The answer: if any optimisation are on, the code executes exactly the same for foo, if volatile there is a difference as the register is loaded more times

int x, y;
volatile int m,n;


void foo(void)
{
for(int i=0; i<1000; i++)
   for(int j=0; j<100; j++)
      x = y;


for(int i=0; i<100; i++)
   for(int j=0; j<1000; j++)
      x = y;
}

void foo1(void)
{
for(int i=0; i<1000; i++)
   for(int j=0; j<100; j++)
      m = n;


for(int i=0; i<100; i++)
   for(int j=0; j<1000; j++)
      m = n;
}

and the generated code ARM: (the x86 verison here: https://godbolt.org/g/tSRKxy).

foo():
        ldr     r3, .L2
        ldr     r2, [r3, #4]
        str     r2, [r3]
        bx      lr
.L2:
        .word   .LANCHOR0
foo1():
        mov     r0, #1000
        ldr     r3, .L14
.L6:
        mov     r2, #100
.L5:
        ldr     r1, [r3, #8]
        subs    r2, r2, #1
        str     r1, [r3, #12]
        bne     .L5
        subs    r0, r0, #1
        bne     .L6
        mov     r0, #100
.L8:
        mov     r2, #1000
.L7:
        ldr     r1, [r3, #8]
        subs    r2, r2, #1
        str     r1, [r3, #12]
        bne     .L7
        subs    r0, r0, #1
        bne     .L8
        bx      lr
.L14:
        .word   .LANCHOR0
x:
y:
n:
m:

Upvotes: 1

Fabel
Fabel

Reputation: 1761

Because initializing a loop requires a tiny amount of overhead (setting the counter variables to zero). In Code 1 that overhead is executed 1+1000 times, while in Code 2 just 1+100 times. Using a real compiler thou would at least remove the loops and reduce the code to just x = y; in both cases since it knows the result will be the same. Or if it couldn't do that (if inner loop contained say f(i,j);) it would perhaps decide to unroll the inner loop in Code 1 but not in Code 2 making Code 1 much faster (probably at least double the speed but it depends on the ISA and ABI of course).

Upvotes: 1

user2371524
user2371524

Reputation:

The question seems to assume that incrementing will be done e.g. in a register and it's more efficient not to have to switch between incrementing i and j and reinitializing j too often. These assumptions might hold on some hardware, but it's unlikely to make a measurable difference.

The correct answer would be: "a), with any decent compiler", because the code created by a decent compiler won't execute anything other than a single x = y; with this source. (so both snippets will compile to exactly the same executable code)


as for "how to check", you have two possibilities:

  • compile to assembly code (e.g. with gcc's -S option) and compare (this is a good way with code as simple as this).
  • measure. Run each code very often (like a million times) and take the time.

Upvotes: 6

CodeMonkey
CodeMonkey

Reputation: 4738

Answer d) is correct. In general you can't say much about performance without referring to all the details of, which compiler is used, which processor architecture is used etc.

I'd stop using that website as a learning resource and find another one.

Upvotes: 6

Related Questions