Is ASM just a MACRO for ML, does it have standarized directives? What about GAS?

Question

I took some courses were MIPS and x86 assembly were taught.

For MIPS we used the MARS simulator.

For x86 we wrote code to templated .S files which they basically ignored the details of, so we basically wrote the body of subroutines...

Both shared some keywords:

.data
.text

Other we did not use in MARS:

.globl
.section

Now I need to learn by my self ARM and I don't know really how to start coding.

What I understand is that in MARS we wrote code that was compiled to ML and sent direcly to the simulated CPU's memmory as if it were a Microcontroller (which by the way I did with a Microchip PIC). With x86 however we run through a host OS, which is Linux, and needed the compiler to add "header" and assign VM to the program thus not trivial.

Probably the the fact is that they taught us ASM more like a MACRO to ML and not as a language that had directives and features, which are low level, that needs to be compiled.

Googling I found GAS which might be what we did with x86...

So the questions are:

Is GAS what is used to write ASM in *NIX systems?
Are those above mentioned keyword inherent from ASM or are they from GAS?

old_timer · Accepted Answer

You don't need to make it that complicated

int main ( void )
{
    return(555);
}

gcc -O2 -c so.c -o so.o
objdump -D so.o

0000000000000000 :
   0:   b8 2b 02 00 00          mov    $0x22b,%eax
   5:   c3                      retq

Or you could look at the assembly generated, I prefer to disassemble, so now I can

.globl main
main:
    mov    $0x22b,%eax
    retq

as so.s -o so.o
gcc so.o -o so
./so

And of course nothing comes out but

so.s

.globl fun
fun:
    mov    $0x22b,%eax
    retq

so.c

#include 

int fun ( void );
int main ( void )
{
    printf("%u
",fun());
    return(0);
}

as so.s -o fun.o
gcc so.c fun.o -o so
./so
555

And of course you can then complicate it as much as you like beyond that.

gcc outputs gnu assembler so

int fun ( void )
{
    return(333);
}

gcc -O2 -save-temps -c so.c -o so.o
cat so.s
    .file   "so.c"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB0:
    .text
.LHOTB0:
    .p2align 4,,15
    .globl  fun
    .type   fun, @function
fun:
.LFB0:
    .cfi_startproc
    movl    $333, %eax
    ret
    .cfi_endproc
.LFE0:
    .size   fun, .-fun
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbits

And although they often have an excess of directives (useful for debuggers and other things but not always used/required), you can use this fact to help to some extent to learn gnu assembler for this target (x86-64), but you of course need the documentation from the processor vendor (Intel in this case). Understanding that the syntax in that document is not necessarily the syntax used by any particular toolchain that you have or will use, you have to be multi-lingual there but you see what the instructions are and what they do and their limits, etc.

MARS and other similar environments are quite useful for teaching and are often designed for that reason leaving out a lot of the traps that you can fall into. The goal being to learn the instruction set by playing with a simulator and get your feet wet in assembly language. I am not a fan of an assembly interface, for educational purposes I think the student should generate/see the machine code, and perhaps within that sim you can, I have only used it for SO questions, I use real or simulated MIPS processors if I want to play with MIPS.

Assembly language is specific to the tool not the target, assume that each assembler for any target has its own assembly language and if there happens to be overlap then so be it.

    global  fun
fun:
    mov    eax, 333
    ret

nasm so.s -felf64 -o so.o
gcc so.c so.o -o so
./so
333

There is the well known Intel vs AT&T thing but those are not syntaxes those are source destination swapping from the Intel standard. nasm doesn't like .globl, try it it likes global without the dot.

    .globl  fun
fun:
    movl    %eax, $333
    ret

so.s:1: error: attempt to define a local label before any non-local labels
so.s:1: error: parser: instruction expected
so.s:3: error: parser: instruction expected

    globl  fun
fun:
    movl    %eax, $333
    ret

nasm so.s -felf64 -o so.o
so.s:1: error: parser: instruction expected
so.s:3: error: parser: instruction expected

    globl  fun <-- note this is line 1
fun:
    mov    %eax, $333 <--- this is line 3
    ret

nasm so.s -felf64 -o so.o
so.s:1: error: parser: instruction expected
so.s:3: error: expression syntax error


    globl  fun
fun:
    mov    eax, 333
    ret

nasm so.s -felf64 -o so.o
so.s:1: error: parser: instruction expected

    global  fun
fun:
    mov    eax, 333
    ret

And nasm is happy

as so.s -o so.o

so.s: Assembler messages:
so.s:1: Error: no such instruction: `global fun'
so.s:3: Error: too many memory references for `mov'

    .global  fun
fun:
    mov    333, eax
    ret

so.s: Assembler messages:
so.s:3: Error: too many memory references for `mov'

    .global  fun
fun:
    mov    $333, eax
    ret

so.s: Assembler messages:
so.s:3: Error: no instruction mnemonic suffix given and no register operands; can't size instruction

    .global  fun
fun:
    movl    $333, eax
    ret

and as is happy BUT, this is broken it thinks eax is a label to be filled in later

0000000000000000 :
   0:   c7 04 25 00 00 00 00    movl   $0x14d,0x0
   7:   4d 01 00 00 
   b:   c3                      retq 

    .global  fun
fun:
    movl    $333, %eax
    ret

0000000000000000 :
   0:   b8 4d 01 00 00          mov    $0x14d,%eax
   5:   c3                      retq  

    .global  fun
fun:
    movl    $333, %eax
    retq

0000000000000000 :
   0:   b8 4d 01 00 00          mov    $0x14d,%eax
   5:   c3                      retq 

    .global  fun
fun:
    mov    $333, %eax
    retq

0000000000000000 :
   0:   b8 4d 01 00 00          mov    $0x14d,%eax
   5:   c3                      retq

nasm:

    global  fun
fun:
    mov    eax, 333
    ret

0000000000000000 :
   0:   b8 4d 01 00 00          mov    $0x14d,%eax
   5:   c3                      retq

Same machine code, different assembly language in more ways than just reversing the source and destination (I used objdump to disassemble so that is why you see that syntax).

gas takes .globl or .global. Since the size of the mov is obvious due to the eax register which is 32 bits the suffix isn't needed movl or mov apparently work with the binutils I have. Likewise ret vs retq produced the same instruction.

The joys of assembly language especially with a painful target like x86 (the last instruction set you want to learn there is a list of more useful/better ones).

But you can see that assembly language can/does differ for the same target the same instructions based on the tool used. And something like MARS starts to make even more sense for that use case.

You won't go wrong learning the gcc/binutils (gnu) tools as you can use them on Windows, Mac, Linux, BSD, etc and all but the system calls and possibly binary file formats are going to be the same experience (okay linker scripts, OS specific stuff will differ).

Depending on the target there may be other good choices too. nasm is popular for the folks that learned Intel syntax from the old days and I suppose others, as well as code that may have been laying about for a while that gas pukes on you might have half a chance with nasm.

And one or the other or both have command line options for the Intel vs ATT source/destination swapping.

Is ASM just a MACRO for ML, does it have standarized directives? What about GAS?

Answers (1)

Related Questions