Ani
Ani

Reputation: 109

ARM - Determine a computer is a Big-Endian or a Little-Endian

I'm new to assembly, particularly, ARM. I'm trying to figure out how to read how the following program determines whether a computer is a Little-Endian or a Big-Endian:

MOV R0, #100
LDR R1, =0X0ABCD876       ;R1 = 0X0ABCD876
STR R1, [R0]
LDRB R2, [R0, #1]

Thank you so much!

Upvotes: 1

Views: 2927

Answers (1)

old_timer
old_timer

Reputation: 71516

MOV R0, #100
LDR R1, =0X0ABCD876       ;R1 = 0X0ABCD876
STR R1, [R0]
LDRB R2, [R0, #1]

There are no pointers in assembly. But if you know enough C to know what a pointer is it is just an address that you are accessing things at or at an offset to. The store and load here are using R0 as the base address for those operations. So you are basically pointing at some memory location with the address in r0, just like in C you point at some memory location with an address contained in a variable with the syntax to make it a pointer.

it hardcodes the value 100 into r0, which does look like you copied this down wrong, but 100 is 0x64 the lower two bits are zeros so you wont get an alignment error, I assume the code was really mov r0,#0x100 but whatever.

The next line is a syntax trick that arm assemblers so far support. It is normally for labels.

ldr r3,=hello
nop
nop
nop
b .
hello: .word 0x12341234

giving

00000000 <hello-0x14>:
   0:   e59f3010    ldr r3, [pc, #16]   ; 18 <hello+0x4>
   4:   e1a00000    nop         ; (mov r0, r0)
   8:   e1a00000    nop         ; (mov r0, r0)
   c:   e1a00000    nop         ; (mov r0, r0)
  10:   eafffffe    b   10 <hello-0x4>
00000014 <hello>:
  14:   12341234    eorsne  r1, r4, #52, 4  ; 0x40000003
  18:   00000014    andeq   r0, r0, r4, lsl r0

Please put the address of the label hello into r3 for me, thanks. Otherwise I have to do this:

ldr r3,hello_add
nop
nop
nop
b .
hello: .word 0x12341234
hello_add: .word hello

00000000 <hello-0x14>:
   0:   e59f3010    ldr r3, [pc, #16]   ; 18 <hello_add>
   4:   e1a00000    nop         ; (mov r0, r0)
   8:   e1a00000    nop         ; (mov r0, r0)
   c:   e1a00000    nop         ; (mov r0, r0)
  10:   eafffffe    b   10 <hello-0x4>

00000014 <hello>:
  14:   12341234    eorsne  r1, r4, #52, 4  ; 0x40000003

00000018 <hello_add>:
  18:   00000014    andeq   r0, r0, r4, lsl r0

which is more typing to get the same result.

So if ldr r7,=something means that something is an address and that syntax means load the address into the register then if something is a number then the assembler will just put that number in for me. and I can also be lazy and type less.

ldr r3,0x11223344
nop
nop
nop
b .

00000000 <.text>:
   0:   e59f300c    ldr r3, [pc, #12]   ; 14 <.text+0x14>
   4:   e1a00000    nop         ; (mov r0, r0)
   8:   e1a00000    nop         ; (mov r0, r0)
   c:   e1a00000    nop         ; (mov r0, r0)
  10:   eafffffe    b   10 <.text+0x10>
  14:   11223344            ; <UNDEFINED> instruction: 0x11223344

the end result is we get that constant into the register. arm and mips and other fixed(ish) length instruction sets have limits on immediates, so an immediate that doesnt fit in the arm instruction causes it to add some data and do this pc relative load like above, but if it fit then

ldr r3,=0x100
nop
nop
nop
b .


00000000 <.text>:
   0:   e3a03c01    mov r3, #256    ; 0x100
   4:   e1a00000    nop         ; (mov r0, r0)
   8:   e1a00000    nop         ; (mov r0, r0)
   c:   e1a00000    nop         ; (mov r0, r0)
  10:   eafffffe    b   10 <.text+0x10>

Now we would hope that the assembler has been told what endianness you going for, actually if you dont then you are in trouble it may not work. so despite be-8 or be-32 lets assume that the comment is correct.

Then STR is a 32 bit store, read your manual. into the address contained in r0. so if this is le or be-32 then as written address 100 gets the byte 0x76 address 101 gets the byte 0xd8, address 102, gets the byte 0xBC and address 103 gets 0x0A. If be-8 then address 100 gets 0x0A, address 101 gets 0xBC, address 102 0xD8 and address 103 0x76.

The ldrb is saying get one byte at address r0+1 which is 101 and put it in r2, pretty sure it does not sign extend it. So if le then r2 will have 0xD8, if be-32 then r2 will get 0xBC and if be-8 then r2 will contain 0xBC. Which as Jester said you then compare to see BE vs LE.

BE-32 means word invariant, word operations (LDR/STR/LDM/STM) do not swap, the non-word (LDRB, LDRH, STRB, STRH) do swap. BE-8 means the byte operations do not swap (byte invariant) but the not-byte ones (word) do swap. So in this case by mixing a word operation and a byte operation one of the swaps and one doesnt depending on the big endian flavor, but for little endian neither swaps.

Of course if the assembler doesnt load r1 correctly (which is key here to this whole thing working) then that is a word operation and may or may not swap yet again. would have been safer to

mov r1,#0x0A000000
orr r1,r1,#0x00BC0000
orr r1,r1,#0x0000d800
orr r1,r1,#0x00000076

and then there are no worries about the pc relative load swapping and/or the assembler placing the value swapped so it unswaps on the way in depending on the architecture. so you have to set the architecture right as well as ask for big endian.

Upvotes: 2

Related Questions