Nathan S
Nathan S

Reputation: 66

How to get qemu to run an arm thumb binary?

I'm trying to learn the basics of ARM assembly and wrote a fairly simple program to sort an array. I initially assembled it using the armv8-a option and ran the program under qemu while debugging with gdb. This worked fine and the program initialized the array and sorted it as expected.

Ultimately I would like to be able to write some assembly for my Raspberry Pi Pico, which has an ARM Cortex M0+, which I believe uses the armv6-m option. However, when I change the directive in my code, it compiles fine but behaves strangely in that the program counter increments by 4 after every instruction instead of the 2 that I expect for thumb. This is causing my program to not work correctly. I suspect that qemu is trying to run my code as if it were compiled for the full ARM instruction set instead of thumb, but I'm not sure why this is.

I am running on Ubuntu Linux 20.04 LTS, using qemu-arm version 4.2.1 (installed from the package manager). Does the qemu-arm executable only run full ARM binaries? If so, is there another qemu package I can install to run a thumb binary?

Here is my code if it is helpful:

.arch armv6-m
.cpu cortex-m0plus

.syntax unified
.thumb

.data
arr: .skip 4 * 10
len: .word 10

.section .text
.global _start

.align 2
_start:
    ldr r0, arr_adr @ load the address of the start of the array into register 0
    movs r1, #0 @ clear the counter register
    movs r2, #100

init_loop:
    str r2, [r0,r1] @ store r2's value to the base address of the array plus the offset stored in r1
    subs r2, r2, #10 @ subtract 10 from r2
    adds r1, #4 @ add 4 to the offset (1 word in bytes)
    cmp r1, #40 @ check if we've reached the end of the array
    bne init_loop

    movs r1, #0 @ clear the offset
out_loop:
    mov r3, r1 @ set the index of the minimum value to the current array index

    mov r4, r1 @ set the inner loop index to the outer loop index

in_loop:
    ldr r5, [r0,r3] @ load the minimum index's value to r5
    ldr r6, [r0,r4] @ load the inner loop's next value to r6
    cmp r6, r5 @ compare the two values
    bge in_loop_inc @ if r6 is greater than or equal to r5, increment and restart loop
    mov r3, r4 @ set the minimum index to the current index
in_loop_inc:
    adds r4, #4
    cmp r4, #40 @ check if at end of array
    blt in_loop

    ldr r5, [r0,r3] @ load the minimum index value into r5
    ldr r6, [r0,r1] @ load the current outer loop index value into r6
    str r6, [r0,r3] @ swap the two values
    str r5, [r0,r1]

    adds r1, #4 @ increment outer loop index
    cmp r1, #40 @ check if at end of array
    blt out_loop

loop:
    nop
    b loop

arr_adr: .word arr

Thank you for your help!

Upvotes: 1

Views: 3917

Answers (2)

old_timer
old_timer

Reputation: 71536

memmap

MEMORY
{
    ram  : ORIGIN = 0x00000000, LENGTH = 32K
}

SECTIONS
{
   .text : { *(.text*) } > ram
}

strap.s

.cpu cortex-m0
.thumb
.syntax unified

.globl reset_entry
reset_entry:
    .word 0x20001000
    .word reset
    .word hang
    .word hang
    .word hang

.thumb_func
reset:
    ldr r0,=0x40002500
    ldr r1,=4
    str r1,[r0]
    ldr r0,=0x40002008
    ldr r1,=1
    str r1,[r0]

    ldr r0,=0x4000251C
    ldr r1,=0x30
    ldr r2,=0x37
loop_top:
    str r1,[r0]
    adds r1,r1,#1
    ands r1,r1,r2
    b loop_top

.thumb_func
hang:
    b hang

build

arm-linux-gnueabi-as --warn --fatal-warnings  strap.s -o strap.o
arm-linux-gnueabi-ld strap.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list

Check the vector table as a quick check:

Disassembly of section .text:

00000000 <reset_entry>:
   0:   20001000    andcs   r1, r0, r0
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   0000002f    andeq   r0, r0, pc, lsr #32
   c:   0000002f    andeq   r0, r0, pc, lsr #32
  10:   0000002f    andeq   r0, r0, pc, lsr #32

00000014 <reset>:
  14:   4806        ldr r0, [pc, #24]   ; (30 <hang+0x2>)
  16:   4907        ldr r1, [pc, #28]   ; (34 <hang+0x6>)
  18:   6001        str r1, [r0, #0]
  1a:   4807        ldr r0, [pc, #28]   ; (38

Looks good,

run it

qemu-system-arm -M microbit -nographic -kernel notmain.elf 

and it will spew out 0123456701234567...until you ctrl-a then x to exit qemu.

Note this binary will not work on a real chip as I am cheating the uart.

You can get your feet wet with this sim. There is also a luminary micro one from the first cortex-m chips and you can limit yourself to armv6m instructions on that platform as well.

qemu and sims like this have very limited value for mcu work since almost all of the work is related to peripherals and pins, and the instruction set is just like the language of a book, French, Russian, English, German, doesn't matter a biology book is a biology book and the book is the goal. The peripherals are specific to the chip (the pico, a specific stm32 chip, a specific TI tiva C chip, etc).

Upvotes: 3

Peter Maydell
Peter Maydell

Reputation: 11393

There are a couple of concepts to disentangle here:

(1) Arm vs Thumb : these are two different instruction sets. Most CPUs support both, some support only one. Both are available simultaneously if the CPU supports both. To simplify a little bit, if you jump to an address with the least significant bit set that means "go to Thumb mode", and jumping to an address with that bit clear means "go to Arm mode". (Interworking is a touch more complicated than that, but that's a good initial mental model.) Note that all Arm instructions are 4 bytes long, but Thumb instructions can be either 2 or 4 bytes long.

(2) A-profile vs M-profile : these are two different families of CPU architecture. M-profile is "microcontrollers"; A-profile is "applications processors", which is "(almost) everything else". M-profile CPUs always support Thumb and only Thumb code. A-profile CPUs support both Arm and Thumb. The Raspberry Pi Pico is a Cortex-M0+, which is M-profile.

(3) QEMU system emulation vs user-mode emulation : these are two different QEMU executables which run guest code in different ways. The system emulation binary (typically qemu-system-arm) runs "bare metal code", eg an entire OS. The guest code has full control and can handle exceptions, write to hardware devices, etc. The user emulation binary (typically qemu-arm) is for running Linux user-space binaries. Guest code is started in unprivileged mode and has access to the usual Linux system calls. For system emulation, which CPU is being emulated depends on what machine type you select with the -M or --machine option. For user-mode emulation, the default CPU is "A-profile with all supported features enabled" (this is --cpu max).

You're currently using qemu-arm which means you get user-mode emulation. This should support Thumb binaries, but unless you pass it a --cpu option it will be using an A-profile CPU. I would also suggest using a newer QEMU for M-profile work, because a lot of M-profile CPU bugs have been fixed since version 4.2. I think 4.2 is also too old to have the Cortex-M0 CPU.

GDB should tell you in the PSR what the T bit is set to -- use that to check whether you're in Thumb mode or Arm mode, rather than looking at how much the PC is incrementing by.

There's currently no QEMU system emulation of the Raspberry Pi Pico (though somebody has been doing some experimental work on one). If your assembly is just basic "working with registers and a bit of memory" you can do that with the user-mode emulator. Or you can try the 'microbit' machine model, which is a Cortex-M0 board -- if you're not doing things that are specific to the Pi Pico that might be good enough.

Upvotes: 3

Related Questions