Eugene Sh.
Eugene Sh.

Reputation: 18381

Define a fixed-position section within a Linker Script of position-independent executable

I am trying to build a static position-independent executable with gcc provided option -static-pie. The target is bare-metal risc-v, so no OS, no dynamic loader. I have a linker script similar to following:

ENTRY(_start)
SECTIONS {
  .text (READONLY) : ALIGN(64) {
    startup.o(.text.startup)
    *(EXCLUDE_FILE(startup.o) .text .text.*)
  }
  .rodata (READONLY) : ALIGN(64) {
    *(.srodata .srodata.*)
    *(.rodata .rodata.*)
    *(.got .got.plt)
  }
  .data ALIGN(64): {
    __global_pointer$ = . + 0x800;
    *(.sdata .sdata.*)
    *(.data .data.*)
  }
  .bss (NOLOAD): ALIGN(64) {
    _bss_start = .;
    *(.sbss .sbss.*)
    *(.bss .bss.*)
    _bss_end = .;
  }
  .stack (NOLOAD): ALIGN(64) {
    _stack_start = .;
    . = . + 0x400;
    _stack_end = .;
  }
}

and when compiling with -fpie and linking with -static-pie it seems to produce a correct PIE binary which seems to function correctly from any address it is loaded to.

Now, to the problem. Consider we have a special memory region at fixed address which I want the program to use (for example some shared memory with another processor) and I want to define it via the linker script. With position dependent code I would do something like this:

In the code:

    __attribute__((section(".special_section")))
    volatile uint8_t shared_mem[100];

In the linker script:

SECTIONS {
.....
    .special_section 0x12340000 (NOLOAD): {
        *(.special_section)
    }
.....
}

and this will ensure that the array shared_mem is located at the fixed address 0x12340000.

However this does not work with static-pie. The accesses to shared_mem which are generated by the compiler are relative to the address the binary is loaded to (that's the idea of PIE, right?). The question is - is there a way to define a specific output section to have an absolute address?

UPDATE: Here is more info of what I am seeing. With the linker script as above and the extra section as follows:

  .special_section 0x1234AB00 (NOLOAD): {
    *(.special_section)
  }

the following code (except the startup code, which I omit):

#include <stdint.h>

__attribute__((section(".special_section")))
volatile uint8_t shared_mem[100];

int main(void) {
    shared_mem[0] = 0x55;
    while (1);
    return 0;
}

the generated (interleaved) assembly looks like this:

__attribute__((section(".special_section")))
volatile uint8_t shared_mem[100];

    int main(void) {
        shared_mem[0] = 0x55;
      34:   05500793            li  a5,85
      38:   1234b717            auipc   a4,0x1234b
      3c:   acf70423            sb  a5,-1336(a4) # 1234ab00 <shared_mem>
        while (1);
      40:   a001                    j   40 <main+0xc>

As we can see the destination address in a4 is formed using PC-relative instruction auipc, which adds the current PC = 0x38 value with (0x1234b << 12) = 0x1234_B000, and then a4=0x55 is stored at that address at offset -1336 = -0x538 (that is 0x38 + 0x1234B000 - 0x538 = 0x1234B000 = 0x1234AB00 - as expected).

However, if the program is loaded to an address other than 0x0, the PC in the above calculation will be different, so the destination address of the operation will be different too.

Upvotes: 1

Views: 71

Answers (1)

Mike Kinghan
Mike Kinghan

Reputation: 61575

When you tell the linker:

SECTIONS {
.....
    .special_section 0x12340000 (NOLOAD): {
        *(.special_section)
    }
.....
}

in your linker script, you are telling it that output section .special_sectiion is to have the VMA (Virtual Memory Address) 0x12340000, meaning that it will have offset 0x12340000 from the start of the program's memory image at runtime. See the ld manual: 3.6.3 Output Section Address. Then if the program is loaded into some physical address space at offset PADDR in that address space, a symbol defined at the start of .special section - e.g. your shared_mem - will be found at PMA (physical memory address)1 PADDR + 0x12340000.

You may can call this a "fixed-position" section in contrast with the usual kind, e.g.

.rela.plt :
{
  *(.rela.plt)
  *(.rela.iplt)
}

In the absence of an address specifier, or explicit assignment of the linker's location counter , or an alignment specification, this example just takes the next VMA implied by the Linker's default heuristic, which is variable from linkage to linkage depending on the number and sizes of the input sections mapped before this one since the VMA was last assigned a fixed value per the script. So .rela.plt might land at different runtime addresses PADDR + 0xNNN... in different images generated per the same script, but shared_mem will always land at the start of .special_section at PADDR + 0x12340000.

Once the program image is output by the linker, every symbol it defines has fixed VMA, of course.

A PIE program (Position-Independent Executable) is one that can run at any PADDR. It can do so thanks to PC-relative addressing, which means that the program finds the address of a symbol never by reference to any constant PMA but always as an offset from the processor's PC (Program Counter). The PC has different mnemonic names on different architectures (rip on Intel, pc on ARM and RISC-V): it holds the PMA of the next instruction up for execution and thus expresses the program's-POV concept of here. PC-relative addressing locates symbols exclusively by their distance from here, at any point in execution. The implementation differs on different architectures. What is located in this way is the symbol's PMA (nothing else is any use at runtime), because it is found at an invariant distance from the PMA in the PC. That distance is invariant because a) The PMA in the PC is an invariant distance from PADDR for any given point in an execution, and b) the PMA of the symbol is an invariant distance from PADDR in every execution.

The linker knows the VMA of every symbol because it assigned it, and it can trivially caclulate the VMA of any instruction, so the linker can and does calculate the distance required to reference a symbol in any PC-relative object code instruction that loads the symbol's address to a register and it physically patches that distance into the machine code instruction in the output image.

So,

As we can see the destination address in a4 is formed using PC-relative instruction auipc, which adds the current PC = 0x38 value with (0x1234b << 12) = 0x1234_B000, and then a4=0x55 is stored at that address at offset -1336 = -0x538 (that is 0x38 + 0x1234B000 - 0x538 = 0x1234B000 = 0x1234AB00 - as expected).

However, if the program is loaded to an address other than 0x0, the PC in the above calculation will be different, so the destination address of the operation will be different too.

Yes it will, because the PMA in the PC at any time depends on PDADDR, but the target address will still be the PMA of shared_mem, because:

  • If PADDR is made = 0x0 + N, then the PMA of shared_mem becomes VMA(shared_mem) + N.
  • If the PMA in the PC at any execution point when the distance to shared_mem is calculated is D for PADDR = 0x0, then for PADDR = N the PMA in the PC will be D + N at the same point in execution.

DIST1 = (X - Y) = DIST2 = ((X + N) - (Y + N)).

You have position-independent code. You have to ensure that the intended content appears at PMA(shared_mem) come runtime.

"Tick one box".

It's not clear whether whether you actually want .special_section to be at a predetermined VMA per the linker script or to be at predetermined PMA in the address space of your target hardware.

If the former, then you (and probably your collaborators) must ensure the hardware programming at the end of your build process places the intended content of .special section at a PMA PSS on the device such that when PADDR is the load address specified in the build, then PSS = PADDR + 0x1234B000. The build derives PSS.

If the latter, with PSS presumably dictated by the hardware, then on the other hand you must derive the VMA of .special_section in the build. Instead of 0x1234B000 it must be a parameter VSS evaluated in the linker script such that PADDR + VSS = PSS (in effect, you generate the linker script at build time).

These alternatives both assume that you resolve the VMA of .special_section in the linker script. In principle there is a third, which is to make your program dynamically discover where the special content is on the device at runtime. e.g. by scanning some memory range for some invariant identifying features. Naturally I'm ignorant of the practicality of that.


1. Unlike "VMA", "PMA" is not a received acronym, but I'm adopting it for convenience.

Upvotes: 0

Related Questions