Reputation: 519
This system call code is not working at all. The compiler is optimizing things out and generally behaving strangely:
template <typename... Args>
inline void print(Args&&... args)
{
char buffer[1024];
auto res = strf::to(buffer) (std::forward<Args> (args)...);
const size_t size = res.ptr - buffer;
register const char* a0 asm("a0") = buffer;
register size_t a1 asm("a1") = size;
register long syscall_id asm("a7") = ECALL_WRITE;
register long a0_out asm("a0");
asm volatile ("ecall" : "=r"(a0_out)
: "m"(*(const char(*)[size]) a0), "r"(a1), "r"(syscall_id) : "memory");
}
This is a custom system call that takes a buffer and a length as arguments. If I write this using global assembly it works as expected, but program code has generally been extraordinarily good if I write the wrappers inline.
A function that calls the print function with a constant string produces invalid machine code:
0000000000120f54 <start>:
start():
120f54: fa1ff06f j 120ef4 <public_donothing-0x5c>
-->
120ef4: 747367b7 lui a5,0x74736
120ef8: c0010113 addi sp,sp,-1024
120efc: 55478793 addi a5,a5,1364 # 74736554 <add_work+0x74615310>
120f00: 00f12023 sw a5,0(sp)
120f04: 00a00793 li a5,10
120f08: 00f10223 sb a5,4(sp)
120f0c: 000102a3 sb zero,5(sp)
120f10: 00500593 li a1,5
120f14: 06600893 li a7,102
120f18: 00000073 ecall
120f1c: 40010113 addi sp,sp,1024
120f20: 00008067 ret
It's not loading a0 with the buffer at sp.
What am I doing wrong?
Upvotes: 1
Views: 1325
Reputation: 363882
It's not loading a0 with the buffer at sp.
Because you didn't ask for a pointer as an "r"
input in a register. The one and only guaranteed/supported behaviour of T foo asm("a0")
is to make an "r"
constraint (including +r or =r) pick that register.
But you used "m"
to let it pick an addressing mode for that buffer, not necessarily 0(a0)
, so it probably picked an SP-relative mode. If you add asm comments inside the template like "ecall # 0 = %0 1 = %1 2 = %2"
you can look at the compiler's asm output and see what it picked. (With clang, use -no-integrated-as
so asm comments in the template come through in the -S
output.)
Wrapping a system call does need the pointer in a specific register, i.e. using "r"
or +"r"
asm volatile ("ecall # 0=%0 1=%1 2=%2 3=%3 4=%4"
: "=r"(a0_out)
: "r"(a0), "r"(a1), "r"(syscall_id), "m"(*(const char(*)[size]) a0)
: // "memory" unneeded; the "m" input tells the compiler which memory is read
);
That "m"
input can be used instead of the "memory"
clobber, not instead of an "r"
pointer input. (For write
specifically, because it only reads that one area of pointed-to memory and has no other side-effects on memory user-space can see, only on kernel write write buffers and file-descriptor positions which aren't C objects this program can access directly. For a read
call, you'd need the memory to be an output operand.)
With optimization disabled, compilers do typically pick another register as the base for the "m"
input (e.g. 0(a5)
for GCC), but with optimization enabled GCC picks 0(a0)
so it doesn't cost extra instructions. Clang still picks 0(a2)
, wasting an instruction to set up that pointer, even though the "=r"(a0_out)
is not early-clobber. (Godbolt, with a very cut-down version of the function that doesn't call strf::to
, whatever that is, just copies a byte into the buffer.)
Interestingly, with optimization enabled for my cut-down stand-alone version of the function without fixing the bug, GCC and clang do happen to put a pointer to buffer
into a0
, picking 0(a0)
as the template expansion for that operand (see the Godbolt link above). This seems to be a missed optimization vs. using 16(sp)
; I don't see why they'd need the buffer address in a register at all.
But without optimization, GCC picks ecall # 0 = a0 1 = 0(a5) 2 = a1
. (In my simplified version of the function, it sets a5 with mv a5,a0
, so it did actually have the address in a0 as well. So it's a good thing you had more code in your function to make it not happen to work by accident, so you could find the bug in your code.)
Upvotes: 2