Piamoon
Piamoon

Reputation: 105

Can the _start symbol in the assembly be replaced with another word?

Two days ago I started learning assembly and I could not find these questions on the internet, I would be glad if you could help. I learned that the starting point of the program must be specified as global _start. I have two questions. First of all, in all the codes I have seen, the global _start part was written inside the text section part. Is it possible to write the global _start part outside the text section? My second question is, can the _start part in the global _start be changed? So if I type global _asd or global qwe for defining the starting point of the program, will I get a syntax error?

Note: I'm currently on a Linux Ubuntu. I'm using nasm tool as assembler and ld as linker.

Upvotes: 2

Views: 2704

Answers (2)

old_timer
old_timer

Reputation: 71606

This is a gnu ld question not nasm. When ld links it is looking for that symbol to mark as the entry point. Your question is vague as to the target, but stating nasm indicates x86 and of course Linux is not vague.

So since you are loading the program being built from an operating system like Linux the entry point is critical, unless of course you manipulate the binary in some way or indicate to the linker in some way what your entry point is. Your program will not operate properly and quite likely simply crash, if the program is not executed in the proper order, you can't just jump into the middle of a program and hope for success, much less try to execute beginning with .data or something not code.

Now as mentioned in comments (up vote the comments please) you can change the entry point label if you don't want to use the _start label. If you do not specify _start, ld will give a warning and continue, but if you don't give it another label then you are at risk of it entering in the wrong place.

If this were bare-metal for a microcontroller for example then you don't have an operating system loading the program into memory and entering anywhere in the binary that you specify, you are instead governed by the hardware/logic and have to conform to its rules and craft the code, linker script, command line, etc to generate the binary to match the logic specified entry point, and in that case you can go without the _start all together, take whatever default ld puts in its output binary which is then at some point used to program the flash/rom in the mcu (stripping all of that knowledge from the binary file including the entry point).

I am not so sure about nasm, but assume you are always in some section, so the label will land somewhere. If it is not in a .text section and you are using it as the entry point (by default, by not specifying something else). Even if it is the last line before a .text section declaration, the linker is going to put that label with the other labels in the section it lands, so because it is in the file just before a .text declaration rather than just after let's say, it may land with an address that is nowhere near the code that follows in the source file.

Some examples, using gnu tools, the question is ld specific so the target and assembler don't necessarily matter here.

MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000
    two   : ORIGIN = 0x2000, LENGTH = 0x1000
    three : ORIGIN = 0x3000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > one
    .data   : { *(.data*)   } > two
    .bss    : { *(.bss*)    } > three
}

.globl _start
_start:
    nop

Building and use readelf

  Entry point address:               0x1000

Now if I

.globl here
here:
    nop

.globl _start
_start:
    nop

.globl there
there:
    nop


00001000 <here>:
    1000:   e1a00000    nop         ; (mov r0, r0)

00001004 <_start>:
    1004:   e1a00000    nop         ; (mov r0, r0)

00001008 <there>:
    1008:   e1a00000    nop         ; (mov r0, r0)

  Entry point address:               0x1000

And that may be confusing... but let's move on.

arm-linux-gnueabi-ld -nostdlib -nostartfiles -e _start -T so.ld so.o -o so.elf

  Entry point address:               0x1004

Or instead

ENTRY(_start)
MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000
...


  Entry point address:               0x1004

But I can also do this:

    .globl here
    here:
        nop
    
        nop
    
    .globl there
    there:
        nop

ENTRY(there)
MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000

  Entry point address:               0x1008

Noting that the linker didn't warn about _start

If I now remove ENTRY() from the linker script.

  Entry point address:               0x1000

But if I do this:

arm-none-eabi-ld so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000

Which means no linker script so it is going to use defaults, then it is looking for it. Which we can do ourselves with

ENTRY(_start)
MEMORY
{

but no defined _start global label

arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000

So if you are simply doing

nasm stuff myprog.asm stuff myprog.o
ld myprog.o -o myprog

You are using whatever default linker settings/script for the tool/environment and it likely has an ENTRY(_start) or equivalent as the default. If you are in complete control of the linker and you want to load a program into Linux then you need a safe/sane entry point for the program to work otherwise ld defaults to the beginning of the binary or beginning of .text which we can test:

SECTIONS
{
    .text   : { *(.text*)   } > two
    .data   : { *(.data*)   } > one
    .bss    : { *(.bss*)    } > three
}

.globl here
here:
    nop

.data
.word 0x12345678

arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000002000


Disassembly of section .text:

00002000 <here>:
    2000:   e1a00000    nop         ; (mov r0, r0)

Disassembly of section .data:

00001000 <.data>:
    1000:   12345678

so beginning of .text not beginning or first address space in the binary

ENTRY(somedata)
MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000
    two   : ORIGIN = 0x2000, LENGTH = 0x1000
    three : ORIGIN = 0x3000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > two
    .data   : { *(.data*)   } > one
    .bss    : { *(.bss*)    } > three
}


.globl here
here:
    nop

.data
.globl somedata
somedata: .word 0x12345678

  Entry point address:               0x1000

This is as trivial to do with nasm and ld as demonstrated above with gas and ld. This shows that _start isn't actually magic any more than main() is with respect to ld (or even gcc). _start seems/feels magic because default linker scripts call it out, so folks think it is magic. main() is magic because the language defines it as such but in reality it is the bootstrap that makes it so and if you simply

gcc helloworld.c -o helloworld

You are getting default bootstrap and linker script. But you could make your own bootstrap or modify the one in your C library and use it and not have a main() in your program and the tools don't care it will just work fine. (not all tools of course as some tools do detect main() and add critical stuff that might not normally get added, especially for C++). But, the gnu tools are particularly flexible and generic which makes them usable for so many targets, bare-metal to kernel drivers to operating system applications.

Use the tools you have, they are very powerful, do experiments like the above first.

Upvotes: 2

Naveed Hematmal
Naveed Hematmal

Reputation: 383

I learned that the starting point of the program must be specified as global _start

No, that's wrong! we can set any name for starting point instead of _start

Is it possible to write the global _start part outside the text section?

Yes!

can the _start part in the global _start be changed? So if I type global _asd or global qwe for defining the starting point of the program, will I get a syntax error?

Yes it can be changed, You will not get any error but need to specify the name of starting point from the CLI while linking.

ld -e starting_point_name app.o -o app

Upvotes: 0

Related Questions