Add command under ARM architecture and program counter

Question

I'm focusing on a snippet of ARM Assembly where add command it is used. The snippet, see below, simply states: to the address of the program counter add the offset calculated to find the position of the string stored at L._str, where L._str is the symbol (the address) of a string contained in the data segment.

movw    r0, :lower16:(L_.str-(LPC1_0+4))
movt    r0, :upper16:(L_.str-(LPC1_0+4))
LPC1_0:
    add r0, pc

The first two instructions (movw and movt) load the 32-bit number representing the address of that string. I'm in Thumb mode, right? Ok, so said this, I've difficulties on how to figure out the overall memory layout. Does the following is the right representation of the code segment of the memory? In addition, are LPC1_0 and L._str the base addresses of add r0, pc the address of A simple string string? What is the dimension of each box? 32 bit or 64 bit depending on the architecture.

--------------------------------------------
| movw    r0, :lower16:(L_.str-(LPC1_0+4)) |
--------------------------------------------
| movt    r0, :upper16:(L_.str-(LPC1_0+4)) |
-------------------------------------------- LPC1_0
| add r0, pc                               |
--------------------------------------------
                       .
                       .
                       .
-------------------------------------------- L._str
| "A simple string"                        |
--------------------------------------------

If so, I can just retrieve the offset (that will be add to the pc) using the difference L_.str-LPC1_0. But, here +4 also is taken into account.

From ADD, pc or sp relative

ADD Rd, Rp, #expr

If Rp is the pc, the value used is: (the address of the current instruction + 4) AND &FFFFFFFC.

So, it appears that if the pc is the Rp I need to take into account also +4 more bytes for the offset offset. Ok. so, where are these bytes added? Why these 4 bytes are taken into account into mov instructions and not before the add command? Is this a optimization features introduced by the compiler?

Notlikethat · Accepted Answer

The normal position-independent "get the address of something" instruction would be simply adr, r0, L._str (which is equivalent to having the assembler/linker automatically calculate an appropriate offset for add r0, pc, #offset). However, since the ARM architecture uses fixed-width encodings - ARM instructions are 32 bits wide, Thumb instructions are either 16 or 32 bits - there are only a limited number of bits of the instruction available to encode the immediate value for the offset, so the maximum range is limited. The maximum possible offset that a Thumb encoding of adr can support is +/-4095 bytes. Since the compiler has no idea how far apart the linker will put the sections, it can't safely emit adr for risk of the final offset being too big to assemble, so instead you get the 3-instruction generate immediate/add PC sequence. The advantage is that it can reach any 32-bit address, the tradeoff is that it takes up more space in the program image and instruction cache - adr alone is 2 or 4 bytes (depending on the offset and target register), the movw/movt/add sequence weighs in at 10 bytes and takes at least twice as long to execute.

As for why the PC offset is folded into the section offset, well, why wouldn't it be? Both are constant, so when the linker is calculating the distance between LPC1_0 and L_.str in the final image to encode the immediate value into the movw/movt instructions, it has absolutely nothing to gain by not adding the PC correction at the same time. That's why the 2-instruction fetch/execute offset of the original ARM's 3-stage pipeline was exposed in the first place, because it was considerably simpler to fix up addresses in the assembler/linker when building software, than to implement all the logic to "correct" it in hardware.

Add command under ARM architecture and program counter

Answers (2)

Related Questions