Reputation:
i have got the following c-function
int main_compare (int nbytes, char *pmem1, char *pmem2){
for(nbytes--; nbytes>=0; nbytes--) {
if(*(pmem1+nbytes) - *(pmem2+nbytes) != 0) {
return 0;
}
}
return 1;
}
and i want to convert it into an ARM - Cortex M3 - assembler code. I'm not really good at this, and i don't have a suitable compiler to test if i do it right. But here comes what i have so far
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
SUB R0, R0, #1 ; nBytes - 1 as maximal value for loop counter
_for_loop:
ADD R3, R1, R0 ;
ADD R4, R2, R0 ; calculate pmem + n
LDRB R3, [R3] ;
LDRB R4, [R4] ; look at this address
CMP R3, R4 ; if cmp = 0, then jump over return
BE _next ; if statement by "branch"-cmd
MOV R0, #0 ; return value is zero
BX LR ; always return 0 here
_next:
sub R0, R0, #1 ; loop counting
BLPL _for_loop ; pl = if positive or zero
MOV R0, #1 ;
BX LR ; always return 1 here
ENDP
but i'm really not sure, if this is right, but i have no idea how to check it....
Upvotes: 2
Views: 5399
Reputation: 71506
Well, here is some compiler generated code
arm-none-eabi-gcc -O2 -mthumb -c test.c -o test.o
arm-none-eabi-objdump -D test.o
00000000 <main_compare>:
0: b510 push {r4, lr}
2: 3801 subs r0, #1
4: d502 bpl.n c <main_compare+0xc>
6: e007 b.n 18 <main_compare+0x18>
8: 3801 subs r0, #1
a: d305 bcc.n 18 <main_compare+0x18>
c: 5c0c ldrb r4, [r1, r0]
e: 5c13 ldrb r3, [r2, r0]
10: 429c cmp r4, r3
12: d0f9 beq.n 8 <main_compare+0x8>
14: 2000 movs r0, #0
16: e000 b.n 1a <main_compare+0x1a>
18: 2001 movs r0, #1
1a: bc10 pop {r4}
1c: bc02 pop {r1}
1e: 4708 bx r1
arm-none-eabi-gcc -O2 -mthumb -mcpu=cortex-m3 -c test.c -o test.o
arm-none-eabi-objdump -D test.o
00000000 <main_compare>:
0: 3801 subs r0, #1
2: b410 push {r4}
4: d503 bpl.n e <main_compare+0xe>
6: e00a b.n 1e <main_compare+0x1e>
8: f110 30ff adds.w r0, r0, #4294967295 ; 0xffffffff
c: d307 bcc.n 1e <main_compare+0x1e>
e: 5c0c ldrb r4, [r1, r0]
10: 5c13 ldrb r3, [r2, r0]
12: 429c cmp r4, r3
14: d0f8 beq.n 8 <main_compare+0x8>
16: 2000 movs r0, #0
18: f85d 4b04 ldr.w r4, [sp], #4
1c: 4770 bx lr
1e: 2001 movs r0, #1
20: f85d 4b04 ldr.w r4, [sp], #4
24: 4770 bx lr
26: bf00 nop
It is funny that the thumb2 extensions dont really seem to make this better, possibly worse.
If you dont have a compiler does that mean you dont have an assembler and linker either? I without an assembler and linker it is going to be a lot of work hand compiling and assembling to machine code. Then how are you going to load this into a processor, etc?
if you dont have a cross compiler for arm do you have a compiler at all? You need to tell us more about what you do and dont have. If you have a web browser that you used to find stackoverflow and post questions you can probably download the code sourcery tools or https://launchpad.net/gcc-arm-embedded tools and have a compiler, assembler and linker (and dont have to hand convert from c to asm).
As far as your code goes the subtract of 1 is correct for the nbytes--, but you failed to compare that nbytes value with zero to see if you dont have to do anything at all.
in pseudo code
if nbytes >= 0 return 1
nbytes--;
add pmem1+nbytes
load [pmem1+nbytes]
add pmem2+nbytes
load [pmem2+nbytes]
subtract
compare with zero
and so on
you went straight to the nbytes-- without doing the if nbytes>=0; comparison.
The assembly for branch if equal is BEQ not BE and BPL instead of BLPL. So fix those, at the very beginning do an unconditional branch to _next and I think that is it you have it coded.
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
B _next
_for_loop:
ADD R3, R1, R0 ;
ADD R4, R2, R0 ; calculate pmem + n
LDRB R3, [R3] ;
LDRB R4, [R4] ; look at this address
CMP R3, R4 ; if cmp = 0, then jump over return
BEQ _next ; if statement by "branch"-cmd
MOV R0, #0 ; return value is zero
BX LR ; always return 0 here
_next:
sub R0, R0, #1 ; loop counting
BPL _for_loop ; pl = if positive or zero
MOV R0, #1 ;
BX LR ; always return 1 here
ENDP
Upvotes: 0
Reputation: 20914
I see just 3 fairly simple problems there:
BE _next ; if statement by "branch"-cmd
...
sub R0, R0, #1 ; loop counting
BLPL _for_loop ; pl = if positive or zero
BEQ
, not BE
- condition codes are always 2 letters.SUB
alone won't update the flags - you need the suffix to say so i.e. SUBS
.BLPL
would branch and link, thus overwriting your return address - you want BPL
. Actually, BLPL
wouldn't assemble here anyway, since in Thumb a conditional BL
would need an IT
to set it up (unless of course your assembler is clever enough to insert one automatically).Edit: there's also of course a more general issue with the use of R4
in both the original code and my examples below - if you're interfacing with C code the original value must be preserved across the function call and restored afterwards (R0
-R3
are designated argument/scratch registers and can be freely modified). If you're in pure assembly however you don't necessarily need to follow a standard calling convention so can be more flexible.
Now, that's a very literal representation of the C code, and doesn't make best use of the instruction set - in particular the indexed addressing modes. One of the attractions of assembly programming is having complete control of the instructions, so how can we make it worth our while?
First, let's make the C code look a little more like the assembly we want:
int main_compare (int nbytes, char *pmem1, char *pmem2){
while(nbytes-- > 0) {
if(*pmem1++ != *pmem2++) {
return 0;
}
}
return 1;
}
Now that that shows our intent more clearly, let's play compiler:
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
_loop:
SUBS R0, R0, #1 ; Decrement nbytes and set flags based on the result
BMI _finished ; If nbytes is now negative, it was 0, so we're done
LDRB R3, [R1], #1 ; Load from the address in R1, then add 1 to R1
LDRB R4, [R2], #1 ; ditto for R2
CMP R3, R4 ; If they match...
BEQ _loop ; then continue round the loop
MOV R0, #0 ; else give up and return zero
BX LR
_finished:
MOV R0, #1 ; Success!
BX LR
ENDP
And that's nearly 25% fewer instructions! Now if we pull in another instruction set feature - conditional execution - and relax the requirements slightly, without breaking C semantics, it gets smaller still:
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
_loop:
SUBS R0, R0, #1 ; In C zero is false and any nonzero value is true, so
; when R0 becomes -1 to trigger this branch, we can just
; return that to indicate success
IT MI ; Make the following instruction conditional on 'minus'
BXMI LR
LDRB R3, [R1], #1
LDRB R4, [R2], #1
CMP R3, R4
BEQ _loop
MOVS R0, #0 ; Using MOVS rather than MOV to get a 16-bit encoding,
; since updating the flags won't matter at this point
BX LR
ENDP
assembling to a meagre 22 bytes, that's nearly 40% less code than we started with :D
Upvotes: 1