Reputation: 21
mov eax,0
mov ecx,100
loop2_start: mov eax,ecx
add eax,ecx
imul eax,eax ;To act as n^2
jle loop2_start
loop2_end: call print_int
call print_nl
I'm trying to find sum of all the squares between 1 to 100. When I print this, it gives me 40000 when the answer should be 338350. What should I change to make it work? Perhaps my jump instruction?
Upvotes: 0
Views: 361
Reputation: 37232
Lets start with some maths.
For positive values of n
, if:
lastSquare = n*n
Then:
nextSquare = (n+1)*(n+1)
= (n * n) + (n * 1) + (1 * n) + 1
= (n * n) + 2 * n + 1
= lastSquare + 2 * n + 1
In other words, (using "C like" pseudo-code) you can do:
square = 0;
sum = 0;
for(n = 1-1; n < 100-1; n++) {
square = square + 2 * n + 1;
sum = sum + square;
}
For 80x86 assembly, square = square + 2 * n + 1
can be done in a single fast instruction - e.g. lea ebx,[ebx+ecx*2+1]
.
This gives something like (NASM syntax, untested):
; Loop preperation
xor ebx,ebx ;ebx = square = 0
xor eax,eax ;eax = sum = 0
xor ecx,ecx ;ecx = n = 1-1
; Loop
.next:
lea ebx,[ebx+ecx*2+1] ;ebx = square + 2 * n + 1
add eax,ebx ;eax = sum + square
; Loop end
inc ecx
cmp ecx,100-1
jb .next
However; the middle of the loop is small compared to the loop's overhead, so it'd be fun to unroll the loop a little to reduce the loop's overhead. 99 is odd, but is a multiple of 3, so:
; Loop preperation
xor ebx,ebx ;ebx = square = 0
xor eax,eax ;eax = sum = 0
xor ecx,ecx ;ecx = n = 1-1
; Loop
.next:
lea ebx,[ebx+ecx*2+1] ;ebx = square + 2 * n + 1
add eax,ebx ;eax = sum + square
lea ebx,[ebx+ecx*2+2+1] ;ebx = square + 2 * (n+1) + 1
add eax,ebx ;eax = sum + square
lea ebx,[ebx+ecx*2+4+1] ;ebx = square + 2 * (n+2) + 1
add eax,ebx ;eax = sum + square
; Loop end
add ecx,3
cmp ecx,100-1
jb .next
This allows you to use maths to cheat more! Specifically:
nextSquare = (n+1)*(n+1)
= (n * n) + (n * 1) + (1 * n) + 1
= (n * n) + 2 * n + 1
= lastSquare + 2 * n + 1
nextNextSquare = (n+2)*(n+2)
= (n * n) + (n * 2) + (2 * n) + 2*2
= (n * n) + 4 * n + 4
= lastSquare + 4 * n + 4
nextNextNextSquare = (n+3)*(n+3)
= (n * n) + (n * 3) + (3 * n) + 3*3
= (n * n) + 6 * n + 9
= lastSquare + 6 * n + 9
nextAddend = nextSquare + nextNextSquare + nextNextNextSquare
= lastSquare + 2 * n + 1 + lastSquare + 4 * n + 4 + lastSquare + 6 * n + 9
= 3 * lastSquare + 12 * n + 14
= 3 * (lastSquare + 4 * n) + 14
This leads to something like (NASM syntax, untested):
; Loop preperation
xor ebx,ebx ;ebx = square = 0
xor eax,eax ;eax = sum = 0
xor ecx,ecx ;ecx = n = 1-1
; Loop
.next:
lea edx,[ebx+ecx*4] ;edx = square + 4 * n
lea edx,[edx*2+edx+14] ;edx = 3 * (square + 4 * n) + 14
add eax,edx ;eax = sum + nextSquare + nextNextSquare + nextNextNextSquare
lea edx,[ecx*2+ecx] ;edx = 2*n
lea ebx,[ebx+2*edx+9] ;ebx = lastSquare + 6 * n + 9 = nextNextNextSquare
; Loop end
add ecx,3
cmp ecx,100-1
jb .next
Of course by unrolling the loop more you get to cheat more; but if you keep doing that and optimizing you end up at "fully unrolled" (no more loop) where the code becomes a single "mov eax,precalculated_constant
" instruction.
Upvotes: 2