Michael Stachowsky
Michael Stachowsky

Reputation: 787

In the ARM ABI, how are global variables accessed?

I am writing a simple multitasking OS for the ARM Cortex M3. My threads always run using the Process Stack Pointer. I have an application that I inherited and that uses global variables. I am trying to call the functions in that application from my threading code but it is not accessing memory correctly. Are the following statements correct:

  1. Those global variables are accessed via some kind of relative addressing, and that relative address is placed on the Main stack (using MSP)?

  2. My threading code, using PSP, will never be able to access them

  3. I need to switch to MSP when calling these functions, then back to PSP when using my threads?

**EDIT: Clarified that this is for a Cortex M

Upvotes: 1

Views: 415

Answers (1)

old_timer
old_timer

Reputation: 71566

Global variables have nothing to do with the stack, even static locals.

So you need to just look at the output of the compiler, it will tell you everything.

Your question is very vague you could be asking one of many different questions. I will show some basics and maybe I will get lucky.

Note that this should in general have nothing to do with the processor, mode, etc. arm, thumb, x86, whatever. Much more to do with the toolchain.

If this is too basic and you are asking some very advanced question it is not obvious to me I will delete or rewrite, no problem.

Throwaway code is always a good idea to figure things out.

flash.s

.thumb
.syntax unified

.word 0x20001000
.word reset

.thumb_func
reset:
    bl notmain
    b .

notmain.c

unsigned int x;
unsigned int y=5;

void notmain ( void )
{
    unsigned int z=7;
    x=++y;
    z--;
}

flash.ld

MEMORY
{
  rom     : ORIGIN = 0x00080000, LENGTH = 0x00001000
  ram     : ORIGIN = 0x20000000, LENGTH = 0x00001000
}

SECTIONS
{
    .text : { *(.text)   } > rom
    .bss : { *(.bss)   } > ram
    .data : { *(.data)   } > ram
}

build

arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -c notmain.c -o notmain.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o notmain.o -o flash.elf
arm-none-eabi-objdump -D flash.elf > flash.list
arm-none-eabi-objcopy -O binary flash.elf flash.bin

examine

Disassembly of section .text:

00080000 <reset-0x8>:
   80000:   20001000    andcs   r1, r0, r0
   80004:   00080009    andeq   r0, r8, r9

00080008 <reset>:
   80008:   f000 f802   bl  80010 <notmain>
   8000c:   e7fe        b.n 8000c <reset+0x4>
    ...

00080010 <notmain>:
   80010:   4b04        ldr r3, [pc, #16]   ; (80024 <notmain+0x14>)
   80012:   4905        ldr r1, [pc, #20]   ; (80028 <notmain+0x18>)
   80014:   681a        ldr r2, [r3, #0]
   80016:   3201        adds    r2, #1
   80018:   601a        str r2, [r3, #0]
   8001a:   600a        str r2, [r1, #0]
   8001c:   685a        ldr r2, [r3, #4]
   8001e:   3a01        subs    r2, #1
   80020:   605a        str r2, [r3, #4]
   80022:   4770        bx  lr
   80024:   20000004    andcs   r0, r0, r4
   80028:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <x>:
20000000:   00000000    andeq   r0, r0, r0

Disassembly of section .data:

20000004 <y>:
20000004:   00000005    andeq   r0, r0, r5

20000008 <z.3645>:
20000008:   00000007    andeq   r0, r0, r7

This is basic not relocatable, etc.

   80010:   4b04        ldr r3, [pc, #16]   ; (80024 <notmain+0x14>)

   80014:   681a        ldr r2, [r3, #0]
   80016:   3201        adds    r2, #1
   80018:   601a        str r2, [r3, #0]

   80024:   20000004    andcs   r0, r0, r4


Disassembly of section .data:

20000004 <y>:
20000004:   00000005    andeq   r0, r0, r5

We can see the y++. r3 gets the address to y, r2 gets the value of y r2 increments, and then is saved back to memory.

And you can see how x and z are handled as well.

Now this cannot work for an mcu for a couple of reasons. The 0x20000000 address information will not be there. Only what is in non-volatile storage will be there when the chip powers up and comes out of reset. The above is relevant depending on what your real question is.

MEMORY
{
  rom     : ORIGIN = 0x00080000, LENGTH = 0x00001000
  ram     : ORIGIN = 0x20000000, LENGTH = 0x00001000
}

SECTIONS
{
    .text : { *(.text)   } > rom
    .bss : { *(.bss)   } > ram AT > rom
    .data : { *(.data)   } > ram AT > rom
}

The program does not change, but the binary does

00000000  00 10 00 20 09 00 08 00  00 f0 02 f8 fe e7 00 00  |... ............|
00000010  04 4b 05 49 1a 68 01 32  1a 60 0a 60 5a 68 01 3a  |.K.I.h.2.`.`Zh.:|
00000020  5a 60 70 47 04 00 00 20  00 00 00 20 05 00 00 00  |Z`pG... ... ....|
00000030  07 00 00 00                                       |....|
00000034

At 0x2C we see the preload value for y and at 0x30 for z.

The .bss value is not located here. Normally what you do is add a whole lot more stuff to the linker script to get the addresses of things. Data start and stop, and bss start and size or stop. Then a bootstrap that copies from flash to ram so that the initialized values are in ram and the read/write works.

So if your project, call it an operating system or not, is just one large body of code that is compiled and linked all together. Then without doing special things like lots of sections or something. The above is what you are looking at and the stack is not related to globals. Because it never is normally.

(msp/psp does not work the way arm implies they do, I have yet to see a use case for the second stack pointer, IF the processor even has it they do not all have it implemented)

Now if your threads are actually separately built programs that you load runtime...Then they completely live in ram. So

MEMORY
{
  rom     : ORIGIN = 0x00080000, LENGTH = 0x00001000
  ram     : ORIGIN = 0x20000000, LENGTH = 0x00001000
}

SECTIONS
{
    .text : { *(.text)   } > ram
    .bss : { *(.bss)   } > ram
    .data : { *(.data)   } > ram
}

and we add -fPIC

arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m0 -fPIC -c notmain.c -o notmain.o

Disassembly of section .text:

20000000 <reset-0x8>:
20000000:   20001000    andcs   r1, r0, r0
20000004:   20000009    andcs   r0, r0, r9

20000008 <reset>:
20000008:   f000 f802   bl  20000010 <notmain>
2000000c:   e7fe        b.n 2000000c <reset+0x4>
    ...

20000010 <notmain>:
20000010:   4a07        ldr r2, [pc, #28]   ; (20000030 <notmain+0x20>)
20000012:   4b08        ldr r3, [pc, #32]   ; (20000034 <notmain+0x24>)
20000014:   447a        add r2, pc
20000016:   58d1        ldr r1, [r2, r3]
20000018:   680b        ldr r3, [r1, #0]
2000001a:   3301        adds    r3, #1
2000001c:   600b        str r3, [r1, #0]
2000001e:   4906        ldr r1, [pc, #24]   ; (20000038 <notmain+0x28>)
20000020:   5852        ldr r2, [r2, r1]
20000022:   6013        str r3, [r2, #0]
20000024:   4a05        ldr r2, [pc, #20]   ; (2000003c <notmain+0x2c>)
20000026:   447a        add r2, pc
20000028:   6813        ldr r3, [r2, #0]
2000002a:   3b01        subs    r3, #1
2000002c:   6013        str r3, [r2, #0]
2000002e:   4770        bx  lr
20000030:   00000034    andeq   r0, r0, r4, lsr r0
20000034:   00000004    andeq   r0, r0, r4
20000038:   00000000    andeq   r0, r0, r0
2000003c:   0000001a    andeq   r0, r0, sl, lsl r0

Disassembly of section .bss:

20000040 <x>:
20000040:   00000000    andeq   r0, r0, r0

Disassembly of section .data:

20000044 <z.3645>:
20000044:   00000007    andeq   r0, r0, r7

20000048 <y>:
20000048:   00000005    andeq   r0, r0, r5

Disassembly of section .got:

2000004c <.got>:
2000004c:   20000040    andcs   r0, r0, r0, asr #32
20000050:   20000048    andcs   r0, r0, r8, asr #32

Disassembly of section .got.plt:

20000054 <_GLOBAL_OFFSET_TABLE_>:
    ...

Because you may need to be able to load the program anywhere in ram (within rules).

The code is all relative, but the data because of the nature of compiling and linking needs some hardcoding. So they setup a global offset table GOT. The location of the got is relative to the code, you cannot change that.

20000010:   4a07        ldr r2, [pc, #28]   ; (20000030 <notmain+0x20>)
20000012:   4b08        ldr r3, [pc, #32]   ; (20000034 <notmain+0x24>)
20000014:   447a        add r2, pc
20000016:   58d1        ldr r1, [r2, r3]
20000018:   680b        ldr r3, [r1, #0]
2000001a:   3301        adds    r3, #1
2000001c:   600b        str r3, [r1, #0]

There is your y++ when built position independent.

r2 gets an offset, r3 gets another offset. r2 is the relative offset to the got from the code, (you cannot separate them and move one around and not the other, not what position independent means) so now r2 points to the GOT. r3 is the offset in the GOT to the address of y. r1 gets the address of y and now it is like before get y in r3, add one, save y to memory.

Now IF you were to relocate this to an address that is not 0x20000000 your bootstrap needs to go to the GOT and patch up all the addresses so you need linker magic to get where the got is and how bit it is, etc...Use the pc to figure out where you are and then make the adjustments. If loaded into memory at 0x20002000 then you need to add 0x2000 to each of the entries in the table and then it will all just work. (still no stack stuff, stack is not related).

A little trick if you have the space.

Notice I put bss before data, and I have at least one .data item. If you can guarantee that (force a .data in your bootstrap for example).

00000000  00 10 00 20 09 00 00 20  00 f0 02 f8 fe e7 00 00  |... ... ........|
00000010  07 4a 08 4b 7a 44 d1 58  0b 68 01 33 0b 60 06 49  |.J.KzD.X.h.3.`.I|
00000020  52 58 13 60 05 4a 7a 44  13 68 01 3b 13 60 70 47  |RX.`.JzD.h.;.`pG|
00000030  34 00 00 00 04 00 00 00  00 00 00 00 1a 00 00 00  |4...............|
00000040  00 00 00 00 07 00 00 00  05 00 00 00 40 00 00 20  |............@.. |
00000050  48 00 00 20 00 00 00 00  00 00 00 00 00 00 00 00  |H.. ............|
00000060

20000040 <x>:
20000040:   00000000    andeq   r0, r0, r0

Objdump pads the binary for a -O binary with zeros for .bss If you put it last then it is not assumed to work.

So I do not know how this code you have uses threads and globals, does it try to keep variables specific to each thread? If so does it use static locals up front then pass the address on the stack (and even there the stack pointer you use does not matter unless you are not properly using the stack in general, if not then globals are not your problem.).

If you start off the thread or any code on one stack pointer and implying completely separate stacks (memory address spaces). And then switch, abandoning stack information needed for the code to work in and out of functions, and then if you return from functions after switching stacks all the code would break not just pointers to static locals that are passed along.

So a minimal example that demonstrates the problem can confirm for us what is really going on and what your questions really are and what the problem is. If you want to use the two stack pointers for a cortex-m you need to carefully read up and you need to also write some throwaway code examples to see how it works, and then apply that to the code the tools are generating.

Again if this is too elementary and I am miles away from the real question, I will certainly delete this no problem.

Upvotes: 2

Related Questions