Lorenzo B
Lorenzo B

Reputation: 33428

Add command under ARM architecture and program counter

I'm focusing on a snippet of ARM Assembly where add command it is used. The snippet, see below, simply states: to the address of the program counter add the offset calculated to find the position of the string stored at L._str, where L._str is the symbol (the address) of a string contained in the data segment.

movw    r0, :lower16:(L_.str-(LPC1_0+4))
movt    r0, :upper16:(L_.str-(LPC1_0+4))
LPC1_0:
    add r0, pc

The first two instructions (movw and movt) load the 32-bit number representing the address of that string. I'm in Thumb mode, right? Ok, so said this, I've difficulties on how to figure out the overall memory layout. Does the following is the right representation of the code segment of the memory? In addition, are LPC1_0 and L._str the base addresses of add r0, pc the address of A simple string string? What is the dimension of each box? 32 bit or 64 bit depending on the architecture.

--------------------------------------------
| movw    r0, :lower16:(L_.str-(LPC1_0+4)) |
--------------------------------------------
| movt    r0, :upper16:(L_.str-(LPC1_0+4)) |
-------------------------------------------- LPC1_0
| add r0, pc                               |
--------------------------------------------
                       .
                       .
                       .
-------------------------------------------- L._str
| "A simple string"                        |
--------------------------------------------

If so, I can just retrieve the offset (that will be add to the pc) using the difference L_.str-LPC1_0. But, here +4 also is taken into account.

From ADD, pc or sp relative

ADD Rd, Rp, #expr

If Rp is the pc, the value used is: (the address of the current instruction + 4) AND &FFFFFFFC.

So, it appears that if the pc is the Rp I need to take into account also +4 more bytes for the offset offset. Ok. so, where are these bytes added? Why these 4 bytes are taken into account into mov instructions and not before the add command? Is this a optimization features introduced by the compiler?

Upvotes: 1

Views: 1421

Answers (2)

Notlikethat
Notlikethat

Reputation: 20974

The normal position-independent "get the address of something" instruction would be simply adr, r0, L._str (which is equivalent to having the assembler/linker automatically calculate an appropriate offset for add r0, pc, #offset). However, since the ARM architecture uses fixed-width encodings - ARM instructions are 32 bits wide, Thumb instructions are either 16 or 32 bits - there are only a limited number of bits of the instruction available to encode the immediate value for the offset, so the maximum range is limited. The maximum possible offset that a Thumb encoding of adr can support is +/-4095 bytes. Since the compiler has no idea how far apart the linker will put the sections, it can't safely emit adr for risk of the final offset being too big to assemble, so instead you get the 3-instruction generate immediate/add PC sequence. The advantage is that it can reach any 32-bit address, the tradeoff is that it takes up more space in the program image and instruction cache - adr alone is 2 or 4 bytes (depending on the offset and target register), the movw/movt/add sequence weighs in at 10 bytes and takes at least twice as long to execute.

As for why the PC offset is folded into the section offset, well, why wouldn't it be? Both are constant, so when the linker is calculating the distance between LPC1_0 and L_.str in the final image to encode the immediate value into the movw/movt instructions, it has absolutely nothing to gain by not adding the PC correction at the same time. That's why the 2-instruction fetch/execute offset of the original ARM's 3-stage pipeline was exposed in the first place, because it was considerably simpler to fix up addresses in the assembler/linker when building software, than to implement all the logic to "correct" it in hardware.

Upvotes: 1

DThought
DThought

Reputation: 1314

My educated guess:

You want to get the "absolute" address where L_.str is in memory. movw and movt seem to add immediate values, so the value is inside the opcode.

The compiler calculates the offset between LPC1_0 and L_.str, and substracts another 4 (bytes).

the add r0,pc instructions adds pc+4 to that value.

the +4 are added by the processor. I think it is because the pc is incremented quite early in the processors "logic", and the add only can read the value of pc afterwards. It's simpler to document that it is really pc+4 than to add extra logic to add pc+4-4 by the processor...

The advantage of that whole solution to calculate the address of L_.str is that its independent of relocation of that code.

Upvotes: 1

Related Questions