Reputation: 57
I'm a beginner in assembly and I'm having difficulties with an exercise where I have to implement a filter of an image encoded in BMP in assembly. Each pixel is encoded on 24 bits so that, each component (blue, green, red) of the pixel is encoded on 8 bits. To represent the image I have an array of uint8_t
. For example the first pixel of the array is represented by array[0] (blue component), array[1] (green component) and array[2] (red component). What I have to do is to implement a filter that finds the mean of the 3 components of each pixel and fix the value of each component to that mean. My problem is then to calculate that mean. The signature of the function is extern size_t filter(const uint8_t* in_buf, uint32_t w, uint32_t h);
Here is my code.
.text
.global filter
filter:
push %ebp
mov %esp, %ebp
xor %ecx, %ecx #increment variable for_width = 0
for_width:
xor %ebx, %ebx #increment variable for_height = 0
for_height:
calcul_offset: # find the position of the position of an i, j pixel
mov %ecx, %esi
mov %ebx, %edx
imull $3, %esi
imull $3, %edx
imull 12(%ebp), %edx #12(%ebp) = width of the image
add %edx, %esi
calcul_mean:
mov %esi, %edx
add 8(%esi), %edx
add 16(%esi), %edx #edx contains the sum of the 3 components
change_pixel: # supposing edx contains the mean of the 3 components
mov 8(%ebp), %esi #8(%ebp) is the adress of the array
mov %edx, (%esi) #blue component
mov %edx, 8(%esi) #green component
mov %edx, 16(%esi) #red component
#here is my problem - to divide %esi by 3
inc %ebx
cmp 16(%ebp), %ebx #16(%ebp) = height of the image
jle for_height
inc %ecx
cmp 12(%ebp), %ecx
jle for_width
# retour
ret
Note that my code is not complete yet and that i m just trying to figure out the division at the moment.
Thank you ! Alexis
Upvotes: 1
Views: 402
Reputation: 365267
Since you asked for feedback on your code: instead of multiplying the loop counter by stuff every iteration, use an add to walk a pointer through the array. This is called loop strength-reduction, because add
is a less-expensive operation than imul
.
Even without strength-reducing, those mov-and-imul sequences are really silly. imul
-immediate is a 3-operand instruction, with a write-only destination. imul $3, %esi
is just short-hand for imul $3, %esi, %esi
. So obviously you could drop the mov %ecx, %esi
. You can also use LEA to multiply by small constants. It's more obvious in Intel syntax (lea esi, [ecx + ecx*2]
), but lea (%ecx, %ecx, 2), %esi
will do the trick.
You can probably also take advantage of 2-register addressing modes instead of doing so much work to get the final address into %esi
. However it might be better to stick to one-register addressing modes for perf reasons on Intel SnB-family CPUs.
I thought you said your values were 1 byte uint8_t
color components? add 8(%esi), %edx
has a 4B memory source operand. I haven't read your code carefully, but if you want to treat color components separately, you need add 8(%esi), %dl
and similar.
This is a good candidate for SIMD-vectorization.
div
is very slow. It's probably faster to replace the actual divide by a multiply and shift, like any good compiler does: clang-3.8 output from that godbolt link:
unsigned div3_32b(unsigned a) { return a/3; }
movl %edi, %ecx
movl $2863311531, %eax # imm = 0xAAAAAAAB
imulq %rcx, %rax
shrq $33, %rax
retq
gcc uses a 32bit one-operand mul instead of 64b imul, so have a look at that in the godbolt link if that matters, otherwise 64b-imul
will be faster on most CPUs, IIRC. (see the x86 tag wiki for optimization links.)
unsigned char div3_8b(unsigned char a) { return a/3; }
imull $171, %edi, %eax
andl $65024, %eax # imm = 0xFE00
shrl $9, %eax
retq
The 8b version looks like a clang bug: it seems to be assuming the upper bits of %edi
are zeroed, so the upper bits of the 8x8 -> 16b multiply result are correct. The 64bit ABI doesn't guarantee this.
That's not a problem when inlining, though. And actually, you'll want a version that works for at least 9b numbers in a 32b register, because three 8b numbers can overflow 8b.
Upvotes: 1
Reputation: 34839
The x86 family of processors has a DIV
instruction, that will do the division for you. The DIV
instruction uses specific input and output registers depending on the size of the operands. From the programmer's guide:
Divides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor).
So to divide ESI
by three, use this code
mov %esi, %eax
xor %edx, %edx
mov $3, %ecx
div %ecx
# the result is now in %eax
Upvotes: 0