Reputation: 4893
I saw the following code (Question 3 on PUZZLERSWORLD.COM site):
Code 1 :
for (i = 0; i < 1000; i++)
for (j = 0; j < 100; j++)
x = y;
Code 2 :
for (i = 0; i < 100; i++)
for (j = 0; j < 1000; j++)
x = y;
Which code will execute faster?
Options:
a) Code 1 and Code 2 are of same speed,
b) Code 1,
c) Code 2,
d) Can't Say
Answer:
c)
So, I have a question, Why is the second code faster than the first?
Upvotes: 1
Views: 1135
Reputation: 144540
The question is missing important information for any documented answer.
If for example we have these programs:
int main(void) {
int i, j, x, y;
for (i = 0; i < 1000; i++)
for (j = 0; j < 100; j++)
x = y;
}
and
int main(void) {
int i, j, x, y;
for (i = 0; i < 100; i++)
for (j = 0; j < 1000; j++)
x = y;
}
Both have undefined behavior because they read the value of uninitialized variable y
.
If we have this:
int main(void) {
int i;
unsigned char j;
int x, y = 0;
for (i = 0; i < 1000; i++)
for (j = 0; j < 100; j++)
x = y;
}
and
int main(void) {
int i;
unsigned char j;
int x, y = 0;
for (i = 0; i < 1000; i++)
for (j = 0; j < 100; j++)
x = y;
}
The second code runs forever and the first finishes instantly.
With decent compilers and correct definitions, the code is almost completely optimized out, so answer a) applies:
int main(void) {
int i, j, x, y = 0;
for (i = 0; i < 100; i++)
for (j = 0; j < 1000; j++)
x = y;
}
compiled with Godbolt's compiler explorer, produces:
main:
xorl %eax, %eax
ret
The site expects you to chose c) because the initial overhead of the nested loop is repeated 10 times more in the second code fragment than in the first. While there is some logic in this analysis, it might not be the dominant factor and as explained above other considerations must be studied too, and in any case 900 extra instructions would be negligible compared to several hundreds of thousands for the rest of the code, making it hardly measurable.
In performance analysis, humility is a cardinal virtue. Use profiling tools and carefully write thorough benchmarking code to determine if optimization is useful at all. It is a common mistake to write complex code in a attempt at optimizing and get code that produces incorrect results.
Note that the linked page is full of broken questions. For example Q2:
#include
fun(int i)
{
int j=0;
while (i & i-1) {
j++;
}
printf("%d",j);
}
main(){
f(100);
}
Options: a. 100 b. 99 c. 2 d. 3
The code does not compile, and were it corrected, it has an infinite loop so none of the answers are correct. For the curious, the correct function to compute the number of bits in i
is:
void fun(unsigned i) {
int j = 0;
while (i) {
i = i & (i - 1);
j++;
}
printf("%d\n", j);
}
Out of 10 questions, 7 have incorrect answers and the remaining 3 (Q1, Q4 and Q5) are trick questions with syntax errors. A very poor quality Q&A site indeed.
Upvotes: 1
Reputation: 37904
Unless x
or y
or both are declared as volatile
, both codes can be reduced into:
x = y;
Example:
int f(int y)
{
int x;
for (int i = 0; i < 1000; i++)
for (int j = 0; j < 100; j++)
x = y;
return x;
}
yields:
f(int):
mov eax, edi
ret
Upvotes: 13
Reputation: 9608
A loop has some overhead
The overhead of the inner loop in code 1 is called 1000 times while in code 2 100 times.
Thatfore code 2 is faster or same
Even if the compiler optimizes very well and removes the loops, code 2 will never be slower
I choose option e)
the compiler does not optimize at all, than the overhead of inner loop applies and code 2 ist faster. That would match option D) or E)
the compiler does optimization
b1) remove loops at all.
Than both codes result to the same. Options A) or E)
b2) combining both loops to one
Than both codes result to the same. Options A) or E)
b3) combining reordering the loops (this would be a silly sub-optimization in the original code)
Than both codes result to the same. Options A) or E)
So after combining all cases only E) is true
Fabel "counterexample" is not valid since there is no reason why applying an optimisation in code 1 but not in code 2
Upvotes: 1
Reputation: 67476
Depending on the type of the variables, as compiler will optimise the code different way.
The answer: if any optimisation are on, the code executes exactly the same for foo, if volatile there is a difference as the register is loaded more times
int x, y;
volatile int m,n;
void foo(void)
{
for(int i=0; i<1000; i++)
for(int j=0; j<100; j++)
x = y;
for(int i=0; i<100; i++)
for(int j=0; j<1000; j++)
x = y;
}
void foo1(void)
{
for(int i=0; i<1000; i++)
for(int j=0; j<100; j++)
m = n;
for(int i=0; i<100; i++)
for(int j=0; j<1000; j++)
m = n;
}
and the generated code ARM: (the x86 verison here: https://godbolt.org/g/tSRKxy).
foo():
ldr r3, .L2
ldr r2, [r3, #4]
str r2, [r3]
bx lr
.L2:
.word .LANCHOR0
foo1():
mov r0, #1000
ldr r3, .L14
.L6:
mov r2, #100
.L5:
ldr r1, [r3, #8]
subs r2, r2, #1
str r1, [r3, #12]
bne .L5
subs r0, r0, #1
bne .L6
mov r0, #100
.L8:
mov r2, #1000
.L7:
ldr r1, [r3, #8]
subs r2, r2, #1
str r1, [r3, #12]
bne .L7
subs r0, r0, #1
bne .L8
bx lr
.L14:
.word .LANCHOR0
x:
y:
n:
m:
Upvotes: 1
Reputation: 1761
Because initializing a loop requires a tiny amount of overhead (setting the counter variables to zero). In Code 1 that overhead is executed 1+1000 times, while in Code 2 just 1+100 times. Using a real compiler thou would at least remove the loops and reduce the code to just x = y;
in both cases since it knows the result will be the same. Or if it couldn't do that (if inner loop contained say f(i,j);
) it would perhaps decide to unroll the inner loop in Code 1 but not in Code 2 making Code 1 much faster (probably at least double the speed but it depends on the ISA and ABI of course).
Upvotes: 1
Reputation:
The question seems to assume that incrementing will be done e.g. in a register and it's more efficient not to have to switch between incrementing i
and j
and reinitializing j
too often. These assumptions might hold on some hardware, but it's unlikely to make a measurable difference.
The correct answer would be: "a), with any decent compiler", because the code created by a decent compiler won't execute anything other than a single x = y;
with this source. (so both snippets will compile to exactly the same executable code)
as for "how to check", you have two possibilities:
gcc
's -S
option) and compare (this is a good way with code as simple as this).Upvotes: 6
Reputation: 4738
Answer d)
is correct. In general you can't say much about performance without referring to all the details of, which compiler is used, which processor architecture is used etc.
I'd stop using that website as a learning resource and find another one.
Upvotes: 6