Reputation: 405
Just out of interested I would like to write a small program in machine code.
I am currently learning about registers, ALU, buses and memory and I'm slightly fascinated that instructions can be written in binary instead an assembly language.
Would a compiler would need to be used?
Preferably one that runs on OSX.
Upvotes: 6
Views: 4716
Reputation: 364160
If you want your machine-code inside a standard object file, with metadata so you can link it with and call it from a C program, you'd probably still want to use an assembler.
Besides object-file metadata, this gives you the huge advantage of being able to write comments. And also labels to get the assembler to calculate displacements for manual jump encoding like db 0xE8
; dd target - ($ + 4)
to encode an x86 jmp rel32
. Or for RIP-relative addressing modes.
Assembler source code usually uses mnemonics like add eax, ecx
to assemble the bytes 01 c8
into the output file (x86). But that source line is exactly equivalent to NASM-syntax db 0x01, 0xc8
(assuming BITS 32 or BITS 64), or with GAS syntax .byte 0x01, 0xc8
.
Either way, those source lines will cause the assembler to output the same 2 bytes into the current section of the output file. That's what assemblers do: write bytes into an output file based on some text source. asm source is a convenient language that maps pretty directly to/from machine code. For x86, the assembler has a few choices to make, to pick the shortest encoding, and choose one of two possible opcodes e.g. between add r/m32, r32
vs. add r32, r/m32
when both operands are registers.
Since you're on MacOS, NASM isn't the most reliable choice. It's had multiple bugs in its MachO64 output format support. AFAIK the current version works, but you might rather use the GNU assembler (which OS X's default compiler, clang, can assemble).
OTOH, NASM does have a convenient flat-binary output mode which you can use to get just the machine code byte without the object file around them, without having to use objcopy
into a flat binary or ld
.
You can write int add(int a, int b) { return a+b; }
in asm for x86-64 MacOS like this. (MacOS prepends C names with a leading underscore)
;section .text ; already the default if you haven't use section .data or anything
; NASM syntax:
global _add ; externally visible symbol name for linking
_add:
lea eax, [rdi+rsi]
ret
We can assemble this with nasm -fmacho64 mac-add.asm
, and get a 238-byte mac-add.o
output file. We can get a byte-for-byte identical output file from writing the bytes with db
directives / pseudo-instructions. But first, lets cheat and find out what the bytes were, so we don't waste time looking at tables up the encoding manually.
(Once you know the basics of how x86 machine code instructions are put together, with prefixes, opcode, ModRM + optional extra bytes, then optional immediate, you'll find it usually uninteresting to look up the actual opcode numbers; the interesting thing is usually just instruction length. Or anything you're curious about, you can look at in the disassembly output.)
For example, rbp not allowed as SIB base? and How to read the Intel Opcode notation give some details about instruction encoding. Understanding how these work is sufficient to have a pretty good idea of x86 machine code without actually knowing the specific numbers for lots of instructions.
$ objdump -d -Mintel mac-add.o
(doesn't support MachO64 object files on my Linux desktop)
$ llvm-objdump -d -x86-asm-syntax=intel mac-add.o
mac-add.o: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
_add:
0: 8d 04 37 lea eax, [rdi + rsi]
3: c3 ret
So in NASM source, mac-raw-add.asm
:
global _add
_add: ; we're still letting the assembler make object-file metadata
db 0x8d, 0x04, 0x37 ; lea eax, [rdi+rsi]
db 0xc3 ; ret
Assembling this with the same nasm -fmacho64
makes a byte-for-byte identical object file. cmp mac-*.o
prints no output and returns true. You could link it with a C program with clang -O2 -g main.c mac-raw-add.o
.
One of the fun things you can do in machine code but not asm is have an instruction overlap other instructions, e.g. enter a loop 4 bytes in with the 1-byte opcode for cmp eax, imm32
instead of 2-byte jmp rel8
. But this is only useful for "code golf" (optimizing for code-size at the expense of everything else, including performance).
Modern CPUs don't like it when they have to decode some code bytes from a different start point than they already decoded from. Some AMD CPUs mark instruction boundaries in L1i cache. I forget if/why Intel CPUs would have a problem. I'm not sure if it would conflict in the uop cache; Agner Fog's microarch guide says for Sandybridge "The same piece of code can have multiple entries in the μop cache if it has multiple jump entries.", but IDK if that works for different decoding of the same bytes.
Anyway, you can do crazy stuff like:
global _copy_nonzero_ints
_copy_nonzero_ints: ;; void f(int *dst, int *src)
xor edx, edx
db 0x3d ; opcode for cmp eax, imm32. Consumes the next 4 bytes as its immediate
;; BAD FOR PERFORMANCE, DON'T DO THIS NORMALLY
.loop: ; do {
mov [rdi + rdx*4 - 4], eax ; 4 bytes long: opcode + ModRM + SIB + disp8. Skipped on first loop iteration: decoded as the immediate for cmp
mov eax, [rsi + rdx*4]
inc edx ; only works for array sizes < 4 * 4GB
test eax, eax
jnz .loop ; }while(src[i] != 0)
ret
Notice that we have the loop-branch at the bottom like we want, but we load and test a dword before storing it. This hypothetical loop doesn't want to store the terminating 0
dword. Normally you'd jmp
into the loop to a label, or peel the load+test from the first iteration to conditionally jump over the loop, or fall into the loop to store the first element if it should run non-zero times. (Why are loops always compiled into "do...while" style (tail jump)?)
The first time through the loop, it decodes as
0: 31 d2 xor edx,edx
2: 3d 89 44 97 fc cmp eax,0xfc974489
7: 8b 04 96 mov eax,DWORD PTR [rsi+rdx*4]
a: ff c2 inc edx
c: 85 c0 test eax,eax
e: 75 f3 jne 3 <_copy_nonzero_ints+0x3>
(from yasm -felf64 foo.asm && objdump -drwC -Mintel foo.o
YASM doesn't create visible symbol-table entries for .label local labels
NASM does even if you don't specify extra debug info)
After the first jnz
is taken, it decodes as:
0000000000000000 <_copy_nonzero_ints>:
0: 31 d2 xor edx,edx
2: 3d .byte 0x3d
0000000000000003 <_copy_nonzero_ints.loop>:
3: 89 44 97 fc mov DWORD PTR [rdi+rdx*4-0x4],eax
7: 8b 04 96 mov eax,DWORD PTR [rsi+rdx*4]
a: ff c2 inc edx
c: 85 c0 test eax,eax
e: 75 f3 jne 3 <_copy_nonzero_ints.loop>
10: c3 ret
Also works with things like db 0xb9, 0x7b
: first 2 bytes of mov ecx, 123
which consumes next 3 as the high bytes of the immediate. Leaves CL with a known value, the high bytes of ECX are dependent on the 3 bytes of code. If you can find instructions that have the encoding you want, you might actually be able to use your code as useful immediate data instead.
The above loop was just a made up example to illustrate a possible use-case for that trick. It's not the most efficient way to implement that function; you'd probably use lodsd
and stosd
if actually golfing for code-size.
Also, this is pretty slow vs. using SSE2 to copy + check 4 dwords at a time, so you wouldn't normally write this anyway for performance. But imagine you're optimizing for code size. (and see Tips for golfing in x86/x64 machine code)
Also, you might index the src relative to the dst, like sub rsi, rdi
before the loop, so you can use add rdi, 4
inside the loop, with mov [rdi-4], eax
stores (which can run on port 7 on Intel so this is more hyperthreading-friendly), and mov eax, [rsi+rdi]
loads.
Upvotes: 0
Reputation: 24847
You need an assembler, you really do, as other posters have said Writing binary instruction codes is so mind-numbingly boring, and has to be so correct, that only a machine should do it. On a non-trivial OS, like OSX. Linux, Windows, the correct header information must be supplied to generate an executable file. Again, this is best done by an assembler package that can link the correct headers in to ensure that you have data, stack and execution for your instructions. Then, your assembler program will crash, and again, and again, for ages :D.
Writing binary instructions is usually classed as torture. Doing it violates basic human rights. If you are ever asked to do it, outsource it to Gitmo.
Get an assembler.
Rgds, Martin
Upvotes: 1
Reputation: 231113
You would not use a compiler to write raw machine code. You would use a hex editor. Unfortunately, I don't use OSX, so I can't provide you a specific link to one.
If you write machine code, you will need to learn how to write the binary headers required by your OS as well. I would recommend doing so and testing with an assembler in raw output format first; once you understand the binary layout it is a purely mechanical task to hand-assemble this to machine code.
Upvotes: 5
Reputation: 63825
You would use a hex editor. I recommend instead of doing that though, learn assembler first. Assembler is basically a language with a 1:1 correspondence between human readable mnemonics and the machine readable hex bytes. For that, you would probably like to look at http://ref.x86asm.net/ and find an assembler that works on x86 Macs. I believe yasm should work.
Writing anything directly in hex is extremely difficult, and your time would probably be spent learning assembly and the underlying machine code that an assembler generates
Upvotes: 3
Reputation: 124642
A compiler turns your non-machine code into machine code... so you would not need a compiler...
Upvotes: 0