Lars Malmsteen
Lars Malmsteen

Reputation: 768

Very simple inline assembly program gives segmentation fault:11 on gcc-5.0

#include <stdio.h>

int square (int n) {
  __asm__("mov %eax, n"
      "mul %eax");
}

int main(void) {
  printf("\nSquare of 4 is %i", square(4));
  /* Calling square gives Segmentation fault: 11 error */
  return 0;
}

When I compile this code on an iMac (Core 2 Duo) with Mac OS X 10.7 & gcc-5.0.0: gcc -o assem -DDEBUG=9 -ansi -pedantic -Wall -g assem.c it's compiled with a warning:

assem.c: In function ‘square’:
assem.c:6:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
assem.c:4:Can't relocate expression. Absolute 0 assumed.

Compilation finished at Mon Jul 25 18:23:47

When I run it it gives Segmentation fault: 11

How to fix it?

Note: I've browsed about 10 questions about Segmentation fault: 11, assembly and inline-assembly none of them helped.

Update

When I change the inline-assembly to: asm ("imul %0, %0" : "+r"(n)); return n; The compiler gives this error:

assem.c: In function ‘square’:
assem.c:4:1: warning: implicit declaration of function ‘asm’ [-Wimplicit-function-declaration]
 asm ("imul %0, %0" : "+r"(n)); 
 ^
assem.c:4:20: error: expected ‘)’ before ‘:’ token
 asm ("imul %0, %0" : "+r"(n)); 
                    ^
assem.c:7:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

Compilation exited abnormally with code 1 at Mon Jul 25 18:46:49

When I change the assembly to asm ("imul %0, %0" : "+r"(n)); the compiler gave a similar error as above.

Update 2 (25.Jul.2022)

In an attempt to solve the issue without radically changing the square function, I've copied part of the code from Peter Cordes's comment with a clang version of it:

#include <stdio.h>

int quadrat (int n) {
  asm { mov eax, n / imul eax,eax / mov n, eax / };
}

int main(void) {
  printf("\nSquare of 4 is %i\n", 4*4);
  return 0;
}

I did have the clang-3.7.1 on my Mac OS X:

>clang -v
clang version 3.7.1 (tags/RELEASE_371/final)
Target: x86_64-apple-darwin11.4.2
Thread model: posix

I've tried to compile it using:

clang -fasm-blocks ass-clang.c

Note: I normally don't ever use clang

The code didn't compile:


ass-clang.c:4:20: error: unexpected token in argument list
  asm { mov eax, n / imul eax,eax / mov n, eax / };

Update #3 (Specific to the bountied question)

How to fix this code

int square (int n) {
  __asm__("mov %eax, n"
      "mul %eax");
}

without altering its basic structure? That is, the n will be moved to eax (or to any other register, if that's necessary) then that register's value will be multiplied by itself, preferably using the mul command and finally the result will be returned preferably without using the return command. In other words, I need a fix to the code, not a rewrite. For instance I consider this to be rewrite:

asm ("imul %0, %0" : "+r"(n)); return n;

Besides, this rewrite is not intuitive. What's that : ? What's that "+r" doing there, is it assigning the Unix read permissions :)

Upvotes: 1

Views: 445

Answers (1)

David Wohlferd
David Wohlferd

Reputation: 7483

preferably without using the return command.

If you want your C code to return a value, it's going to require using return. That's just how C works. That's why Peter was suggesting doing, with clang -fasm-blocks to allow MSVC's style of inline asm. This compiles with with clang on Godbolt, to inefficient asm that stores n to the stack so it can be a memory operand in the asm block (because this style of inline asm requires that inefficiency).

int square(int n)
{
    asm { 
    mov eax, n 
    imul eax,eax 
    mov n, eax }
    
    return n;
}

If you were writing this code in pure asm, the 'eax' register is used to hold the return value from a function. So you can 'cheat' your way into not using the C return statement, but only if you tell the compiler that you'll be handling the act of returning from the function yourself. That's what Peter was talking about when he suggested using __attribute__((naked)).

This attribute informs the compiler that it should assume that everything is being handled 100% by assembler code in the function (including the return value and actually returning). That's what he was talking about when he said you could omit the C return statement and just use an x86 ret instruction once you've populated eax. If you mark a function as naked, you cannot use a return statement. The function must only contain asm, not C code (see the docs). The compiler keeps its hands off entirely, so you can be sure that RSP is pointing at a return address when your asm statement or block gets control. But it also means you can't use named local variables or arguments in your asm, so you have to implement the calling convention yourself.

In essence naked means you are writing a function in assembly, but using the C compiler to give it a name and prototype, and to make it part of your C source file. Not an approach I'd recommend. If you want to write an assembly function, write an assembly function in assembly and link it to your C code. Trying to cram the two together usually just results in confusion.

But naked would allow you to omit the return statement and use a ret instruction, if for some reason that's essential.

What's that : ?

If you've read the docs (you have read the docs, right?), you what have seen:

  • The first colon delimits the 'output' constraints.
  • The second colon delimits the 'input' constraints.
  • The third colon delimits the 'clobbers'.

The fact that @fuz only uses one colon means that he has no input-only constraints and uses no clobbers.

What's that "+r" doing there

Looking at the docs, "r" means "move the value into a register before invoking the asm instructions." And the + means that the value is being updated (as opposed to = which would mean written-but-not-read).

the n will be moved to eax (or to any other register, if that's necessary)

By writing "+r"(n), we've already moved the value from n into a register, and no longer need to include a mov instruction in the template. Since n may already be in a register from earlier code, this probably saves doing an extra mov instruction.

Which register will it pick? Since we didn't specify, the compiler will pick whichever one is most efficient. Since we don't know which one that will be (and it may change from compile to compile), we use %0 to refer to the first constraint, %1 for the second (if we had one), etc.

preferably using the mul command

Well, there's a couple problems with that. The mul instruction requires that you use the eax register. Forcing the compiler to explicitly use eax might generate less efficient code if it's using it for something else.

But more importantly, mul uses 2 registers. When you multiply two registers, the largest possible result is twice that wide. mul is a widening multiply that puts the output in edx:eax.

Your original code makes no provision for values that big. But you can't just ignore the fact that the edx register is getting changed. If you don't tell the compiler that you're changing the contents of that register, it's not going to know. How can it not know? After all, the 'mul' instruction is right there, right? From the docs:

GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input.

Which means if you alter a register without letting the compiler know (via output constraints or clobbers), you can get a big mess if the compiler was using it for something else. By contrast, imul with more than one operand only outputs a single register, which is more consistent with your original code. (Non-widening mul isn't needed because the low half of a full multiply is the same whether the inputs are treated as signed or unsigned.)

For instance I consider this to be rewrite:

I'm not sure I agree.

You want to move the value into a register, "+r" does that for you. You can use mul vs imul if you insist, but you're going to be forcing the compiler to free up the edx register for you in addition to eax. And to run a slower instruction (more uops to write a second register). Why make your code less efficient?

The choice is between "clobbering" the edx register or using imul. Which is the smaller re-write?

Besides, this rewrite is not intuitive.

Ok, now there you've got me.
Writing inline asm is hard (which is why I recommend that you don't do it).

But this "extended asm" approach is how gcc, clang, intel, etc all do it. Microsoft (not surprisingly) went a different way. But their solution turned out to be so hard (or implemented unmaintainably in their compiler), they decided not to support inline asm for 64 bit code. At all. So if you insist on writing inline asm (the most complex way to use assembly language with C), you'll need to learn how it works.

Fuz's approach generates the most efficient code. It also allows the code to be inlined, which most of the other approaches described here do not. It uses the minimum number of registers, leaving these precious resources available for the compiler to use for other purposes.

In summary, I don't know how you're going to find a cleaner, most efficient inline asm solution than this:

int square (int n) {
    asm ("imul %0, %0" : "+r"(n)); 
    return n;
}

Compare how it compiles to how the MSVC-style inline asm version compiles, on Godbolt with clang 14:

# this version, asm ("imul %0, %0" : "+r"(n));
square_gnu(int):
        mov     eax, edi
        imul    eax, eax
        ret
# asm { ... }  version that needs clang -fasm-blocks
# clang -O3 -fasm-blocks
square_msvc(int):
        mov     dword ptr [rsp - 4], edi     # compiler-generated store, into the red-zone

        mov     eax, dword ptr [rsp - 4]     # mov eax, n
        imul    eax, eax                     # square n
        mov     dword ptr [rsp - 4], eax     # mov n, eax

        mov     eax, dword ptr [rsp - 4]     # compiler-generated reload
        ret

BTW, MSVC documents support for leaving a value in EAX at the end of an asm{} block, and then falling off the end of the function without a C return statement. (This works in MSVC even when inlining a function containing an asm{} block. (Presumably enough programmers abused a "happens to work" that MS made it official). This reduces the inefficiency of getting a result out of an asm{} block, but doesn't help with the store/reload to get a value in.

But in clang -fasm-blocks, it only happens to work by chance, breaking when inlined. See Does __asm{}; return the value of eax?

Upvotes: 4

Related Questions