Reputation: 18381
I am trying to build a static position-independent executable with gcc
provided option -static-pie
. The target is bare-metal risc-v, so no OS, no dynamic loader. I have a linker script similar to following:
ENTRY(_start)
SECTIONS {
.text (READONLY) : ALIGN(64) {
startup.o(.text.startup)
*(EXCLUDE_FILE(startup.o) .text .text.*)
}
.rodata (READONLY) : ALIGN(64) {
*(.srodata .srodata.*)
*(.rodata .rodata.*)
*(.got .got.plt)
}
.data ALIGN(64): {
__global_pointer$ = . + 0x800;
*(.sdata .sdata.*)
*(.data .data.*)
}
.bss (NOLOAD): ALIGN(64) {
_bss_start = .;
*(.sbss .sbss.*)
*(.bss .bss.*)
_bss_end = .;
}
.stack (NOLOAD): ALIGN(64) {
_stack_start = .;
. = . + 0x400;
_stack_end = .;
}
}
and when compiling with -fpie
and linking with -static-pie
it seems to produce a correct PIE binary which seems to function correctly from any address it is loaded to.
Now, to the problem. Consider we have a special memory region at fixed address which I want the program to use (for example some shared memory with another processor) and I want to define it via the linker script. With position dependent code I would do something like this:
In the code:
__attribute__((section(".special_section")))
volatile uint8_t shared_mem[100];
In the linker script:
SECTIONS {
.....
.special_section 0x12340000 (NOLOAD): {
*(.special_section)
}
.....
}
and this will ensure that the array shared_mem
is located at the fixed address 0x12340000
.
However this does not work with static-pie
. The accesses to shared_mem
which are generated by the compiler are relative to the address the binary is loaded to (that's the idea of PIE, right?). The question is - is there a way to define a specific output section to have an absolute address?
UPDATE: Here is more info of what I am seeing. With the linker script as above and the extra section as follows:
.special_section 0x1234AB00 (NOLOAD): {
*(.special_section)
}
the following code (except the startup code, which I omit):
#include <stdint.h>
__attribute__((section(".special_section")))
volatile uint8_t shared_mem[100];
int main(void) {
shared_mem[0] = 0x55;
while (1);
return 0;
}
the generated (interleaved) assembly looks like this:
__attribute__((section(".special_section")))
volatile uint8_t shared_mem[100];
int main(void) {
shared_mem[0] = 0x55;
34: 05500793 li a5,85
38: 1234b717 auipc a4,0x1234b
3c: acf70423 sb a5,-1336(a4) # 1234ab00 <shared_mem>
while (1);
40: a001 j 40 <main+0xc>
As we can see the destination address in a4
is formed using PC-relative instruction auipc
, which adds the current PC = 0x38 value with (0x1234b << 12) = 0x1234_B000
, and then a4=0x55
is stored at that address at offset -1336 = -0x538
(that is 0x38 + 0x1234B000 - 0x538 = 0x1234B000 = 0x1234AB00
- as expected).
However, if the program is loaded to an address other than 0x0
, the PC in the above calculation will be different, so the destination address of the operation will be different too.
Upvotes: 1
Views: 71
Reputation: 61575
When you tell the linker:
SECTIONS {
.....
.special_section 0x12340000 (NOLOAD): {
*(.special_section)
}
.....
}
in your linker script, you are telling it that output section .special_sectiion
is to have the VMA (Virtual Memory Address) 0x12340000
, meaning that it
will have offset 0x12340000 from the start of the program's memory image at
runtime. See the ld
manual: 3.6.3 Output Section Address. Then if the program is loaded into some physical address space at
offset PADDR in that address space, a symbol defined at the start of .special section
- e.g. your shared_mem
- will be found at PMA (physical memory address)1
PADDR + 0x12340000.
You may can call this a "fixed-position" section in contrast with the usual kind, e.g.
.rela.plt :
{
*(.rela.plt)
*(.rela.iplt)
}
In the absence of an address specifier, or explicit assignment of
the linker's location counter , or an alignment specification,
this example just takes the next VMA implied by the Linker's default
heuristic, which is variable from linkage to linkage depending on the number and sizes of the input sections
mapped before this one since the VMA was last assigned a fixed value
per the script. So .rela.plt
might land at different runtime addresses
PADDR + 0xNNN...
in different images generated per the same script,
but shared_mem
will always land at the start of .special_section
at PADDR + 0x12340000
.
Once the program image is output by the linker, every symbol it defines has fixed VMA, of course.
A PIE program (Position-Independent Executable) is one that can run at any PADDR.
It can do so thanks to PC-relative addressing, which means that the program
finds the address of a symbol never by reference to any constant PMA but always
as an offset from the processor's PC (Program Counter). The PC has different
mnemonic names on different architectures (rip
on Intel, pc
on ARM and
RISC-V): it holds the PMA of the next instruction up for execution and
thus expresses the program's-POV concept of here. PC-relative addressing
locates symbols exclusively by their distance from here, at any point in
execution. The implementation differs on different architectures. What is located in this way is the symbol's PMA (nothing else is any use at runtime), because it is
found at an invariant distance from the PMA in the PC. That
distance is invariant because a) The PMA in the PC is an invariant distance
from PADDR for any given point in an execution, and b) the PMA of the symbol
is an invariant distance from PADDR in every execution.
The linker knows the VMA of every symbol because it assigned it, and it can trivially caclulate the VMA of any instruction, so the linker can and does calculate the distance required to reference a symbol in any PC-relative object code instruction that loads the symbol's address to a register and it physically patches that distance into the machine code instruction in the output image.
So,
As we can see the destination address in a4 is formed using PC-relative instruction auipc, which adds the current PC = 0x38 value with (0x1234b << 12) = 0x1234_B000, and then a4=0x55 is stored at that address at offset -1336 = -0x538 (that is 0x38 + 0x1234B000 - 0x538 = 0x1234B000 = 0x1234AB00 - as expected).
However, if the program is loaded to an address other than 0x0, the PC in the above calculation will be different, so the destination address of the operation will be different too.
Yes it will, because the PMA in the PC at any time depends on PDADDR,
but the target address will still be the PMA of shared_mem
, because:
shared_mem
becomes VMA(shared_mem
) + N.shared_mem
is calculated
is D for PADDR = 0x0, then for PADDR = N the PMA in the PC will be D + N
at the same point in execution.DIST1 = (X - Y) = DIST2 = ((X + N) - (Y + N)).
You have position-independent code. You have to ensure that the intended content appears at PMA(shared_mem
) come runtime.
"Tick one box".
It's not clear whether whether you actually want .special_section
to be at a predetermined VMA per the linker script or to be at predetermined PMA in the address space of your target hardware.
If the former, then you (and probably your collaborators) must
ensure the hardware programming at the end of your build process places
the intended content of .special section
at a PMA PSS on the device such
that when PADDR is the load address specified in the build, then PSS =
PADDR + 0x1234B000. The build derives PSS.
If the latter, with PSS presumably dictated by the hardware,
then on the other hand you must derive the VMA of .special_section
in the build. Instead of 0x1234B000
it must be a parameter VSS evaluated
in the linker script such that PADDR + VSS = PSS (in effect, you
generate the linker script at build time).
These alternatives both assume that you resolve the VMA of .special_section
in the linker script.
In principle there is a third, which is to make your program dynamically discover
where the special content is on the device at runtime. e.g. by scanning some memory range
for some invariant identifying features. Naturally I'm ignorant of the practicality of that.
Upvotes: 0