J D
J D

Reputation: 48697

Declarative representation of the x86 instruction set

I'm writing an x86 backend for a compiler and am finding it really tedious encoding the machine code for each assembly instruction I need and I'm obviously reinventing the wheel. Is there a declarative representation of this instruction set anywhere, e.g. an XML file mapping instruction operations and operands to bytes?

Upvotes: 4

Views: 569

Answers (2)

Attila Lendvai
Attila Lendvai

Reputation: 1351

I am assuming below that you don't want to depend on something huge like LLVM at runtime.

The reason I have researched this question is that I want to add a machine code emitter to a self-hosting Lisp whose size is in the ballpark of 2000-3000 LoC. Settling with the current LLVM dependency, or the Gnu assembler dependency, would invalidate the very ideal behind this project: self-host from as little code as feasible.

Here's what I have found for now:

There's also LLVM's TableGen

It's not trivial, and to generate your own code from its declarative description you'll need to write C++ code (unless you're ready to parse and process its format yourself). It's comprehensive, but not the simplest.

https://llvm.org/docs/TableGen/index.html

lib/Target/X86/X86InstrInfo.td

llvm-tblgen-10 --help

There's a simpler one in the Previous NEXT emulator

This is basically a web of #define C macros that can be processed relatively simply (see a Lisp example).

Here are the copies/versions that I have found online:

https://github.com/cebix/macemu/blob/master/BasiliskII/src/uae_cpu/compiler/codegen_x86.h

https://unix.superglobalmegacorp.com/previous/newsrc/src/cpu/jit/codegen_x86.h.html

http://previous.unixdude.net/

https://github.com/probonopd/previous/blob/master/src/cpu/jit/codegen_x86.h

Upvotes: 1

Josh Haberman
Josh Haberman

Reputation: 4220

I highly recommend using DynASM for this. It's not a declarative description, but it gives you absolute control over what instructions are emitted, and it's much easier to use than a declarative description would be. It's the ideal way of writing a platform-specific codegen IMO.

It is also very small and unimposing: the runtime is completely contained within a few hundred lines of .h files.

See my DynASM tutorial for an example of writing a very simple codegen with DynASM.

Even if you're not convinced about DynASM, you'll find in the DynASM codebase a pretty concise declarative description of x86 instructions, which you might find useful.

Upvotes: 4

Related Questions