Riess Howder
Riess Howder

Reputation: 405

What compiler would I use to write machine language?

Just out of interested I would like to write a small program in machine code.

I am currently learning about registers, ALU, buses and memory and I'm slightly fascinated that instructions can be written in binary instead an assembly language.

Would a compiler would need to be used?

Preferably one that runs on OSX.

Upvotes: 6

Views: 4716

Answers (5)

Peter Cordes
Peter Cordes

Reputation: 364160

If you want your machine-code inside a standard object file, with metadata so you can link it with and call it from a C program, you'd probably still want to use an assembler.

Besides object-file metadata, this gives you the huge advantage of being able to write comments. And also labels to get the assembler to calculate displacements for manual jump encoding like db 0xE8 ; dd target - ($ + 4) to encode an x86 jmp rel32. Or for RIP-relative addressing modes.


Assembler source code usually uses mnemonics like add eax, ecx to assemble the bytes 01 c8 into the output file (x86). But that source line is exactly equivalent to NASM-syntax db 0x01, 0xc8 (assuming BITS 32 or BITS 64), or with GAS syntax .byte 0x01, 0xc8.

Either way, those source lines will cause the assembler to output the same 2 bytes into the current section of the output file. That's what assemblers do: write bytes into an output file based on some text source. asm source is a convenient language that maps pretty directly to/from machine code. For x86, the assembler has a few choices to make, to pick the shortest encoding, and choose one of two possible opcodes e.g. between add r/m32, r32 vs. add r32, r/m32 when both operands are registers.

Since you're on MacOS, NASM isn't the most reliable choice. It's had multiple bugs in its MachO64 output format support. AFAIK the current version works, but you might rather use the GNU assembler (which OS X's default compiler, clang, can assemble).

OTOH, NASM does have a convenient flat-binary output mode which you can use to get just the machine code byte without the object file around them, without having to use objcopy into a flat binary or ld.

You can write int add(int a, int b) { return a+b; } in asm for x86-64 MacOS like this. (MacOS prepends C names with a leading underscore)

;section .text        ; already the default if you haven't use section .data or anything

; NASM syntax:
global _add                    ; externally visible symbol name for linking
_add:
    lea   eax, [rdi+rsi]
    ret

We can assemble this with nasm -fmacho64 mac-add.asm, and get a 238-byte mac-add.o output file. We can get a byte-for-byte identical output file from writing the bytes with db directives / pseudo-instructions. But first, lets cheat and find out what the bytes were, so we don't waste time looking at tables up the encoding manually.

(Once you know the basics of how x86 machine code instructions are put together, with prefixes, opcode, ModRM + optional extra bytes, then optional immediate, you'll find it usually uninteresting to look up the actual opcode numbers; the interesting thing is usually just instruction length. Or anything you're curious about, you can look at in the disassembly output.)

For example, rbp not allowed as SIB base? and How to read the Intel Opcode notation give some details about instruction encoding. Understanding how these work is sufficient to have a pretty good idea of x86 machine code without actually knowing the specific numbers for lots of instructions.

$ objdump -d -Mintel mac-add.o
  (doesn't support MachO64 object files on my Linux desktop)
$ llvm-objdump -d -x86-asm-syntax=intel mac-add.o

mac-add.o:      file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
_add:
       0:       8d 04 37        lea     eax, [rdi + rsi]
       3:       c3      ret

So in NASM source, mac-raw-add.asm:

global _add
_add:                     ; we're still letting the assembler make object-file metadata
  db 0x8d, 0x04, 0x37     ; lea eax, [rdi+rsi]
  db 0xc3                 ; ret

Assembling this with the same nasm -fmacho64 makes a byte-for-byte identical object file. cmp mac-*.o prints no output and returns true. You could link it with a C program with clang -O2 -g main.c mac-raw-add.o.


Silly computer tricks with machine code

One of the fun things you can do in machine code but not asm is have an instruction overlap other instructions, e.g. enter a loop 4 bytes in with the 1-byte opcode for cmp eax, imm32 instead of 2-byte jmp rel8. But this is only useful for "code golf" (optimizing for code-size at the expense of everything else, including performance).

Modern CPUs don't like it when they have to decode some code bytes from a different start point than they already decoded from. Some AMD CPUs mark instruction boundaries in L1i cache. I forget if/why Intel CPUs would have a problem. I'm not sure if it would conflict in the uop cache; Agner Fog's microarch guide says for Sandybridge "The same piece of code can have multiple entries in the μop cache if it has multiple jump entries.", but IDK if that works for different decoding of the same bytes.

Anyway, you can do crazy stuff like:

global _copy_nonzero_ints
_copy_nonzero_ints:      ;; void f(int *dst, int *src)

   xor  edx, edx
   db 0x3d       ; opcode for cmp eax, imm32.  Consumes the next 4 bytes as its immediate
   ;;   BAD FOR PERFORMANCE, DON'T DO THIS NORMALLY
.loop:                        ; do {
    mov  [rdi + rdx*4 - 4], eax    ; 4 bytes long: opcode + ModRM + SIB + disp8.  Skipped on first loop iteration: decoded as the immediate for cmp
    mov  eax, [rsi + rdx*4]
    inc  edx                       ; only works for array sizes < 4 * 4GB
    test eax, eax
    jnz  .loop                ; }while(src[i] != 0)

    ret

Notice that we have the loop-branch at the bottom like we want, but we load and test a dword before storing it. This hypothetical loop doesn't want to store the terminating 0 dword. Normally you'd jmp into the loop to a label, or peel the load+test from the first iteration to conditionally jump over the loop, or fall into the loop to store the first element if it should run non-zero times. (Why are loops always compiled into "do...while" style (tail jump)?)

The first time through the loop, it decodes as

   0:   31 d2                   xor    edx,edx
   2:   3d 89 44 97 fc          cmp    eax,0xfc974489
   7:   8b 04 96                mov    eax,DWORD PTR [rsi+rdx*4]
   a:   ff c2                   inc    edx
   c:   85 c0                   test   eax,eax
   e:   75 f3                   jne    3 <_copy_nonzero_ints+0x3>

(from yasm -felf64 foo.asm  && objdump -drwC -Mintel foo.o
 YASM doesn't create visible symbol-table entries for .label local labels
 NASM does even if you don't specify extra debug info)

After the first jnz is taken, it decodes as:

0000000000000000 <_copy_nonzero_ints>:
   0:   31 d2                   xor    edx,edx
   2:   3d                      .byte 0x3d

0000000000000003 <_copy_nonzero_ints.loop>:
   3:   89 44 97 fc             mov    DWORD PTR [rdi+rdx*4-0x4],eax
   7:   8b 04 96                mov    eax,DWORD PTR [rsi+rdx*4]
   a:   ff c2                   inc    edx
   c:   85 c0                   test   eax,eax
   e:   75 f3                   jne    3 <_copy_nonzero_ints.loop>
  10:   c3                      ret    

Also works with things like db 0xb9, 0x7b : first 2 bytes of mov ecx, 123 which consumes next 3 as the high bytes of the immediate. Leaves CL with a known value, the high bytes of ECX are dependent on the 3 bytes of code. If you can find instructions that have the encoding you want, you might actually be able to use your code as useful immediate data instead.


The above loop was just a made up example to illustrate a possible use-case for that trick. It's not the most efficient way to implement that function; you'd probably use lodsd and stosd if actually golfing for code-size.

Also, this is pretty slow vs. using SSE2 to copy + check 4 dwords at a time, so you wouldn't normally write this anyway for performance. But imagine you're optimizing for code size. (and see Tips for golfing in x86/x64 machine code)

Also, you might index the src relative to the dst, like sub rsi, rdi before the loop, so you can use add rdi, 4 inside the loop, with mov [rdi-4], eax stores (which can run on port 7 on Intel so this is more hyperthreading-friendly), and mov eax, [rsi+rdi] loads.

Upvotes: 0

Martin James
Martin James

Reputation: 24847

You need an assembler, you really do, as other posters have said Writing binary instruction codes is so mind-numbingly boring, and has to be so correct, that only a machine should do it. On a non-trivial OS, like OSX. Linux, Windows, the correct header information must be supplied to generate an executable file. Again, this is best done by an assembler package that can link the correct headers in to ensure that you have data, stack and execution for your instructions. Then, your assembler program will crash, and again, and again, for ages :D.

Writing binary instructions is usually classed as torture. Doing it violates basic human rights. If you are ever asked to do it, outsource it to Gitmo.

Get an assembler.

Rgds, Martin

Upvotes: 1

bdonlan
bdonlan

Reputation: 231113

You would not use a compiler to write raw machine code. You would use a hex editor. Unfortunately, I don't use OSX, so I can't provide you a specific link to one.

If you write machine code, you will need to learn how to write the binary headers required by your OS as well. I would recommend doing so and testing with an assembler in raw output format first; once you understand the binary layout it is a purely mechanical task to hand-assemble this to machine code.

Upvotes: 5

Earlz
Earlz

Reputation: 63825

You would use a hex editor. I recommend instead of doing that though, learn assembler first. Assembler is basically a language with a 1:1 correspondence between human readable mnemonics and the machine readable hex bytes. For that, you would probably like to look at http://ref.x86asm.net/ and find an assembler that works on x86 Macs. I believe yasm should work.

Writing anything directly in hex is extremely difficult, and your time would probably be spent learning assembly and the underlying machine code that an assembler generates

Upvotes: 3

Ed Swangren
Ed Swangren

Reputation: 124642

A compiler turns your non-machine code into machine code... so you would not need a compiler...

Upvotes: 0

Related Questions