Arnab Das
Arnab Das

Reputation: 55

Difficulty understanding how compilers and assembly language piece together

This is more of a conceptual question, but I am learning about embedded systems for an upcoming project. I have been looking through the tutorial on tutorials point.

https://www.tutorialspoint.com/embedded_systems/es_tools.htm

This webpage talks about compilers, assemblers, and coupling.

BASICALLY: How does the assembly process work with compilers if at all. Where and how can I piece this information? What am I not getting?

Upvotes: 0

Views: 90

Answers (1)

old_timer
old_timer

Reputation: 71506

Try it yourself using the GNU tools:

#define FIVE 5

extern unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
    return(more_fun(FIVE)+1);
}

Saving temps gcc first needs to pre-process to pull in includes and replace defines/macros

# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "so.c"




extern unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
    return(more_fun(5)+1);
}

That gets fed to the actual compiler, gcc the program is not the compiler it is a program that calls other programs. The compiler output is assembly language

    .arch armv5t
    .fpu softvfp
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 2
    .eabi_attribute 34, 0
    .eabi_attribute 18, 4
    .file   "so.c"
    .text
    .align  2
    .global fun
    .syntax unified
    .arm
    .type   fun, %function
fun:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    push    {r4, lr}
    mov r0, #5
    bl  more_fun
    add r0, r0, #1
    pop {r4, pc}
    .size   fun, .-fun
    .ident  "GCC: (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
    .section    .note.GNU-stack,"",%progbits

gcc then calls the assembler to assemble that into an object, which is as much of the machine code that the assembler can resolve, plus ideally other information for debugging and linking. Using a disassembler we can see the code produced by the assembler:

Disassembly of section .text:

00000000 <fun>:
   0:   e92d4010    push    {r4, lr}
   4:   e3a00005    mov r0, #5
   8:   ebfffffe    bl  0 <more_fun>
   c:   e2800001    add r0, r0, #1
  10:   e8bd8010    pop {r4, pc}

The bl 0 in the middle the call to the more_fun function was not resolved as that code was not part of the original C source file so a placeholder is put in there and the linker will come along later and link the objects together. If you don't specify -c then gcc will also call the linker for you.

Most "toolchains" work this way, it's the sane way to do it. For just in time and "why do you climb mountains, because they are there" reasons there are some compilers that go more directly to machine code, but even llvm doesn't do that and it claims to be JIT, although its primary use is otherwise. A toolchain doesn't have to use separate executables, various ways to solve the problem.

I don't remember if that site you linked is on the list of sites you should avoid at all costs, there is one or some like it that have some very bad information that is confusing and wrong. That page wasn't bad nor confusing, but I only skimmed it.

Decompilers don't really exist in the form folks would like compiling as you can see in this simple example, information from the original code is lost, you can't completely recreate this code from the binary. Pretty easy to make similar simple examples that demonstrate this.

Upvotes: 1

Related Questions