Why are labels not printed while re-writing assembly with just bytes, and how can a program always start at the same memory location?

Question

I have the following assembly code linked to final executable.

section .text
global _start

_start: mov eax, 4
        mov ebx, 1
        mov ecx, mesg
        mov edx, 9
        int 0x80
mesg    db      "Kingkong",0xa

The next thing I did was get its hexcode 0xb8,0x04,0x00,0x00,0x00,0xbb,0x01,0x00,0x00,0x00,0xb9,0x76,0x80,0x04,0x08,0xba,0x09,0x00,0x00,0x00,0xcd,0x80,0x4b,0x69,0x6e,0x67,0x6b,0x6f,0x6e,0x67,0x0a

and place it into another program which looks like the one below

section .text
global _start
_start:
        db 0xb8,0x04,0x00,0x00,0x00,0xbb,0x01,0x00,0x00,0x00,0xb9,0x76,0x80,0x04,0x08,0xba,0x09,0x00,0x00,0x00,0xcd,0x80,0x4b,0x69,0x6e,0x67,0x6b,0x6f,0x6e,0x67,0x0a

Now when I assemble the above file and get an objdump over it, it gives me

08048060 <_start>:
 8048060:       b8 04 00 00 00          mov    $0x4,%eax
 8048065:       bb 01 00 00 00          mov    $0x1,%ebx
 804806a:       b9 76 80 04 08          mov    $0x8048076,%ecx
 804806f:       ba 09 00 00 00          mov    $0x9,%edx
 8048074:       cd 80                   int    $0x80
 8048076:       4b                      dec    %ebx
 8048077:       69 6e 67 6b 6f 6e 67    imul   $0x676e6f6b,0x67(%esi),%ebp
 804807e:       0a                      .byte 0xa

The mesg label is not seen in the final dump, how does the program then figure out the address of the mesg segment in the above program?

EDIT: Well I would like to add a small question to this after reading the answers, I can understand that labels are not used for the actual addressing but the address is directly baked into the code, But if address are specified like mov $0x8048076,%ecx what is the guarantee that the next time the program loads it will start exactly at that same address ... What if I wrap this code with a C ? What if I want to run it on another machine with a completely different memory pattern ?

Vivin Paliath · Accepted Answer

Labels are translated to offsets/addresses. You won't see the actual label unless you explicitly preserve that information for debugging.

The line:

mov    $0x8048076, %ecx

basically has the value of mesg, which is the address 0x8048076, which is also the start of your string Kingkong.

The program doesn't need to "figure out" what the value of mesg is because it doesn't even know that there is something called mesg. All it sees is an address, which is fine, because that's all it needs.

Using named labels is just convenient and helps with readability. They only really matter to the assembler and linker in the sense that they will convert the value of the label into its actual address or offset. It can also be used by the debugger (if you instruct the assembler or linker to preserve debugging information) to help you debug your code.

To address your second question:

The addresses that you have are virtual memory addresses (i.e., they are not physical memory addresses). All this means is that your executable doesn't really need to know what physical address it will be at, since the OS will map it to the correct location (i.e., in physical memory) at runtime. This is why your executable will work if you run it on another machine (assuming the executable has been compiled for that OS) or if you run it repeatedly. The OS takes care of mapping that virtual address to physical memory.

You can take a look here and here for more information.

Why are labels not printed while re-writing assembly with just bytes, and how can a program always start at the same memory location?

Answers (2)

Related Questions