Alexis
Alexis

Reputation: 57

Mean of 3 numbers in assembly

I'm a beginner in assembly and I'm having difficulties with an exercise where I have to implement a filter of an image encoded in BMP in assembly. Each pixel is encoded on 24 bits so that, each component (blue, green, red) of the pixel is encoded on 8 bits. To represent the image I have an array of uint8_t. For example the first pixel of the array is represented by array[0] (blue component), array[1] (green component) and array[2] (red component). What I have to do is to implement a filter that finds the mean of the 3 components of each pixel and fix the value of each component to that mean. My problem is then to calculate that mean. The signature of the function is extern size_t filter(const uint8_t* in_buf, uint32_t w, uint32_t h); Here is my code.

.text
.global filter

filter:
    push %ebp
    mov %esp, %ebp
    xor %ecx, %ecx #increment variable for_width = 0
    for_width:
            xor %ebx, %ebx #increment variable for_height = 0
            for_height:
                    calcul_offset: # find the position of the position of an i, j pixel
                            mov %ecx, %esi
                            mov %ebx, %edx
                            imull $3, %esi
                            imull $3, %edx
                            imull 12(%ebp), %edx #12(%ebp) = width of the image
                            add %edx, %esi
                    calcul_mean:
                            mov %esi, %edx
                            add 8(%esi), %edx
                            add 16(%esi), %edx #edx contains the sum of the 3 components
                    change_pixel: # supposing edx contains the mean of the 3 components
                            mov 8(%ebp), %esi #8(%ebp) is the adress of the array
                            mov %edx, (%esi) #blue component
                            mov %edx, 8(%esi) #green component
                            mov %edx, 16(%esi) #red component
                            #here is my problem - to divide %esi by 3
                    inc %ebx
                    cmp 16(%ebp), %ebx #16(%ebp) = height of the image
                    jle for_height
            inc %ecx
            cmp 12(%ebp), %ecx
            jle for_width
    # retour
    ret

Note that my code is not complete yet and that i m just trying to figure out the division at the moment.

Thank you ! Alexis

Upvotes: 1

Views: 402

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 365267

Since you asked for feedback on your code: instead of multiplying the loop counter by stuff every iteration, use an add to walk a pointer through the array. This is called loop strength-reduction, because add is a less-expensive operation than imul.

Even without strength-reducing, those mov-and-imul sequences are really silly. imul-immediate is a 3-operand instruction, with a write-only destination. imul $3, %esi is just short-hand for imul $3, %esi, %esi. So obviously you could drop the mov %ecx, %esi. You can also use LEA to multiply by small constants. It's more obvious in Intel syntax (lea esi, [ecx + ecx*2]), but lea (%ecx, %ecx, 2), %esi will do the trick.

You can probably also take advantage of 2-register addressing modes instead of doing so much work to get the final address into %esi. However it might be better to stick to one-register addressing modes for perf reasons on Intel SnB-family CPUs.


I thought you said your values were 1 byte uint8_t color components? add 8(%esi), %edx has a 4B memory source operand. I haven't read your code carefully, but if you want to treat color components separately, you need add 8(%esi), %dl and similar.

This is a good candidate for SIMD-vectorization.


div is very slow. It's probably faster to replace the actual divide by a multiply and shift, like any good compiler does: clang-3.8 output from that godbolt link:

unsigned div3_32b(unsigned a) { return a/3; }
        movl    %edi, %ecx
        movl    $2863311531, %eax       # imm = 0xAAAAAAAB  
        imulq   %rcx, %rax
        shrq    $33, %rax
        retq

gcc uses a 32bit one-operand mul instead of 64b imul, so have a look at that in the godbolt link if that matters, otherwise 64b-imul will be faster on most CPUs, IIRC. (see the tag wiki for optimization links.)

unsigned char div3_8b(unsigned char a) { return a/3; }
        imull   $171, %edi, %eax
        andl    $65024, %eax            # imm = 0xFE00
        shrl    $9, %eax
        retq

The 8b version looks like a clang bug: it seems to be assuming the upper bits of %edi are zeroed, so the upper bits of the 8x8 -> 16b multiply result are correct. The 64bit ABI doesn't guarantee this.

That's not a problem when inlining, though. And actually, you'll want a version that works for at least 9b numbers in a 32b register, because three 8b numbers can overflow 8b.

Upvotes: 1

user3386109
user3386109

Reputation: 34839

The x86 family of processors has a DIV instruction, that will do the division for you. The DIV instruction uses specific input and output registers depending on the size of the operands. From the programmer's guide:

Divides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor).

So to divide ESI by three, use this code

mov %esi, %eax
xor %edx, %edx
mov $3, %ecx
div %ecx         
# the result is now in %eax

Upvotes: 0

Related Questions