HoldenDinerman
HoldenDinerman

Reputation: 43

Segmentation fault with a variable in SECTION .DATA

I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.

My code:

SECTION .DATA
    print_str:          db 'Hello, world.', 10
    print_str_len:      equ $-print_str

    limit:              equ 10
    step:               dw 1

SECTION .TEXT
    GLOBAL _start 

_start:
    mov eax, 4              ; 'write' system call = 4
    mov ebx, 1              ; file descriptor 1 = STDOUT
    mov ecx, print_str      ; string to write
    mov edx, print_str_len  ; length of string to write
    int 80h                 ; call the kernel

    mov eax, [step]         ; moves the step value to eax
    inc eax                 ; Increment
    mov [step], eax         ; moves the eax value to step
    cmp eax, limit          ; Compare sil to the limit
    jle _start              ; Loop while less or equal

exit:
    mov eax, 1              ; 'exit' system call
    mov ebx, 0              ; exit with error code 0
    int 80h                 ; call the kernel

The result:

Hello, world.
Segmentation fault (core dumped)

The cmd:

nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file

Upvotes: 1

Views: 848

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364190

section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.

Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.

Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.

On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently


After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.

$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257                          /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0                          [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0                          [stack]

As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.

Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)

From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:

$ readelf -a foo   # broken version
  ...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000bf 0x00000000000000bf  R      0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .DATA .TEXT 

Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.

After fixing your asm to use section .data and .text, readelf shows us:

$ readelf -a foo    # fixed version
  ...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000e7 0x00000000000000e7  R E    0x200000
  LOAD           0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
                 0x0000000000000010 0x0000000000000010  RW     0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .text 
   01     .data 

Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.

The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.


After fixing it to use section .data, your program runs without segfaulting.

The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)

dw means "data word" or "define word". Use dd for a data dword.


BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.

But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.

Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.

Upvotes: 4

Related Questions