Reputation: 493
I recently came across this post describing the smallest possible ELF executable for Linux, however the post was written for 32 bit and I was unable to get the final version to compile on my machine.
This brings me to the question: what's the smallest x86-64 ELF executable it's possible to write that runs without error?
Abusing or violating the ELF specification is ok as long as current Linux kernels in practice will run it, as with the last couple versions of the MuppetLabs 32-bit teensy ELF executable article.
Upvotes: 14
Views: 7237
Reputation: 13736
Most articles out there give up on ld
and resort to hand-crafting the ELF headers waaay too soon, including the amazing answer from Matteo Italia.
I've discovered you can get to the standard ELF header + program code 120-ish bytes limit using only standard tools, no need to insert the ELF header in your ASM
Standard assembly code, with a few tricks:
; tiny.asm
BITS 64
SECTION .text align=1
GLOBAL _start
_start:
; _exit(42)
; all registers zeroed by Linux ABI at start, so safe to use al/dil
mov al, 60 ; Select the _exit syscall (60 in Linux ABI)
mov dil, 42 ; Set the exit code argument for _exit
syscall ; Perform the selected syscall
Remarks:
al
/dil
instead of the common eax
/edi
or the naive rax
/rdi
, for a 7-byte code payload. This is fine as Linux ABI guarantees all registers to be zero on program start.align=1
so ld
with its default linker script for a non-PIE can pick 0x400078
as the program entry point address, putting the payload right after the ELF header. As explained by @ecm, NASM's default is align=16
, which makes ld
use 8 bytes of padding to get to 0x400080
.And now some fine-tuned command-line arguments:
nasm -f elf64 tiny.asm &&
ld -s -no-pie -z noseparate-code tiny.o -o tiny
Results:
$ wc -c tiny && ./tiny; echo $?
336 tiny
42
That's already better than the 352 bytes Matteo had in his last ld
attempt. And the code payload only accounts for 3 out of the 18 bytes saved.
But the payload is not the point here. My goal is to get rid of all section headers, so we get to the 120+payload size which is the absolute minimum before manually fiddling with the ELF header.
Given our 7-byte payload, we aim for a 127-byte binary, breaking the ~300 bytes barrier.
$ strip --strip-section-headers tiny && wc -c tiny && ./tiny; echo $?
127 tiny
42
A 62% reduction with a single strip, and goal achieved!
$ readelf -Wa tiny
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400078
Start of program headers: 64 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 1
Size of section headers: 0 (bytes)
Number of section headers: 0
Section header string table index: 0
There are no sections in this file.
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x00007f 0x00007f R E 0x1000
There is no dynamic section in this file.
There are no relocations in this file.
No processor specific unwind information to decode
Dynamic symbol information is not available for displaying symbols.
No version information found in this file.
This is a result Matteo and most articles only achieved by pasting the ELF header in ASM and editing the loader address by hand, but now we tamed nasm
, ld
and strip
to do it automatically for us.
And, for completeness, the 4-byte payload true
clone that yields impressive 124 bytes in just 5 lines, which I believe is the smallest possible size before non-standard approaches like overlapping headers and embedding the payload in it:
SECTION .text align=1
GLOBAL _start
_start:
mov al, 60
syscall
nasm -f elf64 tiny.asm &&
ld -s -no-pie -z noseparate-code tiny.o -o tiny &&
strip --strip-section-headers tiny && wc -c tiny && ./tiny; echo $?
124 tiny
0
A tiny executable and a tiny source!
Upvotes: 4
Reputation: 31
Updated Answer
After seeing the tricks used in @Matteo Italia's answer, I found it's possible to reach 112 bytes since we can not only hide the string but also the code in the EFL header.
Explanations:
The key idea is hiding everthing to the header, including string "Hello World!\n" and the code to print the string. We should first test what part of the header is modifiable (aka modify the value and the program can still be executed). Then, we hide our data and code in header as following code shows: (compile with command nasm -f bin ./x.asm
)
nop
instructions in other space that's available for use inside / between the ELF headers which we can't avoid. We still have space to waste in p_paddr
and p_align
.bits 64
org 0x08048000
ehdr: ; Elf64_Ehdr
db 0x7F, "ELF", ; e_ident
_start:
mov dl, 13
mov esi,STR
pop rax
syscall
jmp _S0
dw 2 ; e_type
dw 62 ; e_machine
dd 0xff ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
STR:
db "Hello Wo" ; e_shoff
db "rld!" ; e_flags
dw 0x0a ; e_ehsize, ther place where we hide the next line symbol
dw phdrsize ; e_phentsize
phdr: ; Elf64_Phdr
dw 1 ; e_phnum p_type
dw 0 ; e_shentsize
dw 5 ; e_shnum p_flags
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
dq 0 ; p_offset
dq $$ ; p_vaddr
_S0:
nop ; unused space for more code
nop
nop
nop
nop
nop
jmp _S1 ; p_paddr, These 8 bytes belong to p_paddr, I nop them to show we can add some asm code here
dq filesize ; p_filesz
dq filesize ; p_memsz
_S1:
mov eax,60 ; p_align[0:5]
syscall ; p_align[6:7]
nop ; p_align[7:8]
phdrsize equ $ - phdr
filesize equ $ - $$
Original Post:
I have a 129-byte x64 "Hello World!".
Step1. Compile the following asm code with nasm -f bin hw.asm
; hello_world.asm
BITS 64
org 0x400000
ehdr: ; Elf64_Ehdr
db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 0x3e ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
phdr: ; Elf64_Phdr
dd 1 ; e_phnum ; p_type
; e_shentsize
dd 5 ; e_shnum ; p_flags
; e_shstrndx
ehdrsize equ $ - ehdr
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
; write "Hello World!" to stdout
pop rax
mov dl, 60
mov esi, hello
syscall
syscall
hello: db "Hello World!", 10 ; 10 is the ASCII code for newline
filesize equ $ - $$
Step2. Modify it with following python script
from pwn import *
context.log_level='debug'
context.arch='amd64'
context.terminal = ['tmux', 'splitw', '-h', '-F' '#{pane_pid}', '-P']
with open('./hw','rb') as f:
pro = f.read()
print(len(pro))
pro = list(pro)
cut = 0x68
pro[0x18] = cut
pro[0x74] = 0x7c-(0x70-cut)
pro = pro[:cut]+pro[0x70:]
print(pro)
x = b''
for _ in pro:
x+=_.to_bytes(1,'little')
with open("X",'wb') as f:
f.write(x)
You should a 129-byte "Hello World".
[18:19:02] n132 :: xps ➜ /tmp » strace ./X
execve("./X", ["./X"], 0x7fffba3db670 /* 72 vars */) = 0
write(0, "Hello World!\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 60Hello World!
) = 60
exit(0) = ?
+++ exited with 0 +++
[18:19:04] n132 :: xps ➜ /tmp » ./X
Hello World!
[18:19:11] n132 :: xps ➜ /tmp » ls -la ./X
-rwxrwxr-x 1 n132 n132 129 Jan 29 18:18 ./X
Upvotes: 3
Reputation: 126957
Starting from an answer of mine about the "real" entrypoint of an ELF executable on Linux and "raw" syscalls, we can strip it down to
bits 64
global _start
_start:
mov di,42 ; only the low byte of the exit code is kept,
; so we can use di instead of the full edi/rdi
xor eax,eax
mov al,60 ; shorter than mov eax,60
syscall ; perform the syscall
I don't think you can get it to be any smaller without going out of specs - in particular, the psABI doesn't guarantee anything about the state of eax
. This gets assembled to precisely 10 bytes (as opposed to the 7 bytes of the 32 bit payload):
66 bf 2a 00 31 c0 b0 3c 0f 05
The straightforward way (assemble with nasm
, link with ld
) produces me a 352 bytes executable.
The first "real" transformation he does is building the ELF "by hand"; doing this (with some modifications, as the ELF header for x86_64 is a bit bigger)
bits 64
org 0x08048000
ehdr: ; Elf64_Ehdr
db 0x7F, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 62 ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
dd 1 ; p_type
dd 5 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
mov di,42 ; only the low byte of the exit code is kept,
; so we can use di instead of the full edi/rdi
xor eax,eax
mov al,60 ; shorter than mov eax,60
syscall ; perform the syscall
filesize equ $ - $$
we get down to 130 bytes. This is a tad bigger than the 91 bytes executable, but it comes from the fact that several fields become 64 bits instead of 32.
We can then apply some tricks similar to his; the partial overlap of phdr
and ehdr
can be done, although the order of fields in phdr
is different, and we have to overlap p_flags
with e_shnum
(which however should be ignored due to e_shentsize
being 0).
Moving the code inside the header is slightly more difficult, as it's 3 bytes larger, but that part of header is just as big as in the 32 bit case. We overcome this by starting 2 bytes earlier, overwriting the padding byte (ok) and the ABI version field (not ok, but still works).
So, we reach:
bits 64
org 0x08048000
ehdr: ; Elf64_Ehdr
db 0x7F, "ELF", 2, 1, ; e_ident
_start:
mov di,42 ; only the low byte of the exit code is kept,
; so we can use di instead of the full edi/rdi
xor eax,eax
mov al,60 ; shorter than mov eax,60
syscall ; perform the syscall
dw 2 ; e_type
dw 62 ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
phdr: ; Elf64_Phdr
dw 1 ; e_phnum p_type
dw 0 ; e_shentsize
dw 5 ; e_shnum p_flags
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
filesize equ $ - $$
which is 112 bytes long.
Here I stop for the moment, as I don't have much time for this right now. You now have the basic layout with the relevant modifications for 64 bit, so you just have to experiment with more audacious overlaps
Upvotes: 18