gonzo
gonzo

Reputation: 519

RISC-V inline assembly using memory not behaving correctly

This system call code is not working at all. The compiler is optimizing things out and generally behaving strangely:

template <typename... Args>
inline void print(Args&&... args)
{
    char buffer[1024];
    auto res = strf::to(buffer) (std::forward<Args> (args)...);
    const size_t size = res.ptr - buffer;

    register const char* a0 asm("a0") = buffer;
    register size_t      a1 asm("a1") = size;
    register long syscall_id asm("a7") = ECALL_WRITE;
    register long        a0_out asm("a0");

    asm volatile ("ecall" : "=r"(a0_out)
        : "m"(*(const char(*)[size]) a0), "r"(a1), "r"(syscall_id) : "memory");
}

This is a custom system call that takes a buffer and a length as arguments. If I write this using global assembly it works as expected, but program code has generally been extraordinarily good if I write the wrappers inline.

A function that calls the print function with a constant string produces invalid machine code:

0000000000120f54 <start>:
start():
  120f54:       fa1ff06f                j       120ef4 <public_donothing-0x5c>
-->
  120ef4:       747367b7                lui     a5,0x74736
  120ef8:       c0010113                addi    sp,sp,-1024
  120efc:       55478793                addi    a5,a5,1364 # 74736554 <add_work+0x74615310>
  120f00:       00f12023                sw      a5,0(sp)
  120f04:       00a00793                li      a5,10
  120f08:       00f10223                sb      a5,4(sp)
  120f0c:       000102a3                sb      zero,5(sp)
  120f10:       00500593                li      a1,5
  120f14:       06600893                li      a7,102
  120f18:       00000073                ecall
  120f1c:       40010113                addi    sp,sp,1024
  120f20:       00008067                ret

It's not loading a0 with the buffer at sp.

What am I doing wrong?

Upvotes: 1

Views: 1325

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 363882

It's not loading a0 with the buffer at sp.

Because you didn't ask for a pointer as an "r" input in a register. The one and only guaranteed/supported behaviour of T foo asm("a0") is to make an "r" constraint (including +r or =r) pick that register.

But you used "m" to let it pick an addressing mode for that buffer, not necessarily 0(a0), so it probably picked an SP-relative mode. If you add asm comments inside the template like "ecall # 0 = %0 1 = %1 2 = %2" you can look at the compiler's asm output and see what it picked. (With clang, use -no-integrated-as so asm comments in the template come through in the -S output.)

Wrapping a system call does need the pointer in a specific register, i.e. using "r" or +"r"

    asm volatile ("ecall  # 0=%0   1=%1  2=%2  3=%3  4=%4"
        : "=r"(a0_out)
        : "r"(a0), "r"(a1), "r"(syscall_id), "m"(*(const char(*)[size]) a0)
        : // "memory"  unneeded; the "m" input tells the compiler which memory is read
    );

That "m" input can be used instead of the "memory" clobber, not instead of an "r" pointer input. (For write specifically, because it only reads that one area of pointed-to memory and has no other side-effects on memory user-space can see, only on kernel write write buffers and file-descriptor positions which aren't C objects this program can access directly. For a read call, you'd need the memory to be an output operand.)

With optimization disabled, compilers do typically pick another register as the base for the "m" input (e.g. 0(a5) for GCC), but with optimization enabled GCC picks 0(a0) so it doesn't cost extra instructions. Clang still picks 0(a2), wasting an instruction to set up that pointer, even though the "=r"(a0_out) is not early-clobber. (Godbolt, with a very cut-down version of the function that doesn't call strf::to, whatever that is, just copies a byte into the buffer.)


Interestingly, with optimization enabled for my cut-down stand-alone version of the function without fixing the bug, GCC and clang do happen to put a pointer to buffer into a0, picking 0(a0) as the template expansion for that operand (see the Godbolt link above). This seems to be a missed optimization vs. using 16(sp); I don't see why they'd need the buffer address in a register at all.

But without optimization, GCC picks ecall # 0 = a0 1 = 0(a5) 2 = a1. (In my simplified version of the function, it sets a5 with mv a5,a0, so it did actually have the address in a0 as well. So it's a good thing you had more code in your function to make it not happen to work by accident, so you could find the bug in your code.)

Upvotes: 2

Related Questions