Reputation: 91630

What is the calling convention for floating-point values in C for x86_64 in System V?

I'm currently doing a deep-dive into Assembly land, mainly from the perspective of x86_64, C, and System V AMD64, generally targeting Linux.

It's pretty straightforward that the calling convention for integer (and by implication, pointer) values by using the following registers in order:

RDI
RSI
RDX
RCX
R8
R9
XMM0–7

Longer argument counts are handled by pushing values onto the stack frame of the subroutine. I got these register names from the Wikipedia page on x86_64 calling conventions.

For larger values like structs and arrays, the convention also seems to be to push into the callee's stack frame.

However, what is the calling convention for floating-point arguments to functions? Are floating point registers used?

Another related question: what if I have mixed argument types?

void mixed(int a, float b, mystruct c) { /* ... */ }

If my function takes an arg list like this, how do I call such a function from Assembly? Which registers are used in interleaved arg lists like this?

Upvotes: 7

Answers (1)

Naftuli Kay

Reputation: 91630

The first up-to-8 float / double args are passed in XMM0-7, one each in the bottom of the register like with movss / movsd, regardless of how many earlier args of other types there are.
(Unlike Windows x64, where only the first 4 total args can be passed in registers.)

A float or double is returned in XMM0.

For example, double foo(float a, double b) { return a+b; } compiles to (Godbolt)

foo:
        cvtss2sd  xmm0, xmm0   # implicit conversion of a to double
        addsd     xmm0, xmm1   # add, leaving the return value in XMM0
        ret

Compilers are good at following the ABI, so compiling an example with optimization enabled is a good way to see which incoming registers or memory locations hold which args. Assigning values to a volatile global can be a good way to force compilers to not optimize something away. And/or compile a caller that passes different values for each arg.

The calling convention for parameter passing is specified in the System V Application Binary Interface for AMD64^PDF documentation in section 3.2.3.

I'm not sure if the documentation can be legally quoted here, but I can at least paraphrase.

Classification Types

First, the documentation defines eight different classifications for parameter values:

INTEGER: integer types and pointers which use the general purpose registers
SSE: types that use vector registers.
SSEUP: similar to SSE but primarily used to store upper bytes of large (>=128-bit) values
X87: floating point types.
X87UP: the upper bytes of large floating point types.
COMPLEX_X87: registers for complex floating point types.
NO_CLASS: padding areas and for empty structures and unions, typically in memory on the stack.
MEMORY: types that are exclusively passed on the stack in main memory.

Classification Rules

It next defines how C types fit into these classifications:

_Bool, char, short, int, long, long long, and pointers are classified as INTEGER and will use those registers.
float, double, _Decimal32, _Decimal64, and __m64 are classified as SSE and will use those registers.
__float128, _Decimal128, and __m128 are split in half, storing the least significant bytes/bits in SSE and the most significant bytes/bits in SSEUP.
__m256 is split into four 64-bit (8 byte) values, with the least significant bytes being stored as SSE and the rest as SSEUP
__m512 is similarly split into 64-bit (8 byte) chunks, with the least significant bytes stored as SSE and everything else as SSEUP
long double values store their 64-bit mantissa as X87 and the 16-bit exponent is padded to 64-bits (8 bytes) and stored in X87UP.
__int128 is essentially stored as two long values in INTEGER with the first half being the low bits/bytes and the second half being the high bits/bytes. They can be understood as if they were defined as a struct:
```
 typedef struct {
   long low_bits, high_bits;
 } __int128;
```
complex double and complex float types are split in half, with the first half being the real component and the second half being the imaginary component, and are stored in SSE. They can be understood as if they were defined as a struct like so:
```
 typedef struct {
   double real, imaginary;
 } complex_double;
```
complex long double values are classified as COMPLEX_X87.
The logic for structs, unions, and arrays is fairly complicated, consult the documentation linked above for more information. In a nutshell, there is a recursive algorithm defined for how to pass aggregate types that decides how values are passed.

Argument Passing

Now that we have a classification system and a recursive algorithm for dealing with structs, unions, and arrays, we apply this system and algorithm to the parameters to a function, which consists of the following steps for each argument:

If it's a MEMORY object, write it to the stack.
If it's an INTEGER, use the next available register from %rdi, %rsi, %rdx, %rcx, %r8, and %r9.
If it's SSE, use the next available register in the range %xmm0 to %xmm7.
If it's SSEUP, use the next available 64-bit chunk of the last-used %xmm register for SSE types.
If it's X87, X87UP, or COMPLEX_X87, it's passed in memory.

Rinse and repeat for all argument values. If you run out of registers for a given type, write to the stack.

TL;DR There is a non-trivial, but fairly straightforward algorithm defined by the System V ABI for passing different types of data.

Upvotes: 8

What is the calling convention for floating-point values in C for x86_64 in System V?

Answers (1)

Classification Types

Classification Rules

Argument Passing

Related Questions