user14358127
user14358127

Reputation: 21

Sum of all the squares between 1 to 100 in assembly

 mov eax,0
 mov ecx,100 

loop2_start: mov eax,ecx
             add eax,ecx 
             imul eax,eax  ;To act as n^2
             jle loop2_start

loop2_end: call print_int
           call print_nl

I'm trying to find sum of all the squares between 1 to 100. When I print this, it gives me 40000 when the answer should be 338350. What should I change to make it work? Perhaps my jump instruction?

Upvotes: 0

Views: 361

Answers (1)

Brendan
Brendan

Reputation: 37232

Lets start with some maths.

For positive values of n, if:

lastSquare = n*n

Then:

nextSquare = (n+1)*(n+1)
           = (n * n) + (n * 1) + (1 * n) + 1
           = (n * n) + 2 * n + 1
           = lastSquare + 2 * n + 1

In other words, (using "C like" pseudo-code) you can do:

    square = 0;
    sum = 0;
    for(n = 1-1; n < 100-1; n++) {
        square = square + 2 * n + 1;
        sum = sum + square;
    }

For 80x86 assembly, square = square + 2 * n + 1 can be done in a single fast instruction - e.g. lea ebx,[ebx+ecx*2+1].

This gives something like (NASM syntax, untested):

; Loop preperation

    xor ebx,ebx           ;ebx = square = 0
    xor eax,eax           ;eax = sum = 0
    xor ecx,ecx           ;ecx = n = 1-1

; Loop

.next:
    lea ebx,[ebx+ecx*2+1] ;ebx = square + 2 * n + 1
    add eax,ebx           ;eax = sum + square

; Loop end

    inc ecx
    cmp ecx,100-1
    jb .next

However; the middle of the loop is small compared to the loop's overhead, so it'd be fun to unroll the loop a little to reduce the loop's overhead. 99 is odd, but is a multiple of 3, so:

; Loop preperation

    xor ebx,ebx           ;ebx = square = 0
    xor eax,eax           ;eax = sum = 0
    xor ecx,ecx           ;ecx = n = 1-1

; Loop

.next:
    lea ebx,[ebx+ecx*2+1]   ;ebx = square + 2 * n + 1
    add eax,ebx             ;eax = sum + square
    lea ebx,[ebx+ecx*2+2+1] ;ebx = square + 2 * (n+1) + 1
    add eax,ebx             ;eax = sum + square
    lea ebx,[ebx+ecx*2+4+1] ;ebx = square + 2 * (n+2) + 1
    add eax,ebx             ;eax = sum + square

; Loop end

    add ecx,3
    cmp ecx,100-1
    jb .next

This allows you to use maths to cheat more! Specifically:

nextSquare = (n+1)*(n+1)
           = (n * n) + (n * 1) + (1 * n) + 1
           = (n * n) + 2 * n + 1
           = lastSquare + 2 * n + 1

nextNextSquare = (n+2)*(n+2)
               = (n * n) + (n * 2) + (2 * n) + 2*2
               = (n * n) + 4 * n + 4
               = lastSquare + 4 * n + 4

nextNextNextSquare = (n+3)*(n+3)
                   = (n * n) + (n * 3) + (3 * n) + 3*3
                   = (n * n) + 6 * n + 9
                   = lastSquare + 6 * n + 9

nextAddend = nextSquare  + nextNextSquare + nextNextNextSquare
           = lastSquare + 2 * n + 1 + lastSquare + 4 * n + 4 + lastSquare + 6 * n + 9
           = 3 * lastSquare + 12 * n + 14
           = 3 * (lastSquare + 4 * n) + 14

This leads to something like (NASM syntax, untested):

; Loop preperation

    xor ebx,ebx           ;ebx = square = 0
    xor eax,eax           ;eax = sum = 0
    xor ecx,ecx           ;ecx = n = 1-1

; Loop

.next:
    lea edx,[ebx+ecx*4]      ;edx = square + 4 * n
    lea edx,[edx*2+edx+14]   ;edx = 3 * (square + 4 * n) + 14
    add eax,edx              ;eax = sum + nextSquare  + nextNextSquare + nextNextNextSquare
    lea edx,[ecx*2+ecx]      ;edx = 2*n
    lea ebx,[ebx+2*edx+9]    ;ebx = lastSquare + 6 * n + 9 = nextNextNextSquare


; Loop end

    add ecx,3
    cmp ecx,100-1
    jb .next

Of course by unrolling the loop more you get to cheat more; but if you keep doing that and optimizing you end up at "fully unrolled" (no more loop) where the code becomes a single "mov eax,precalculated_constant" instruction.

Upvotes: 2

Related Questions