John
John

Reputation: 247

Building an assembler

I need to build an assembler for a CPU architecture that I've built. The architecture is similar to MIPS, but this is of no importance.

I started using C#, although C++ would be more appropriate. (C# means faster development time for me).

My only problem is that I can't come with a good design for this application. I am building a 2 pass assembler. I know what I need to do in each pass.\

I've implemented the first pass and I realised that if I have to lines assembly code on the same line ...no error is thrown.This means only one thing poor parsing techniques.

So almighty programmers, fathers of assembler enlighten me how should I proceed. I just need to support symbols and data declaration. Instructions have fixed size.

Please let me know if you need more information.

Upvotes: 15

Views: 7822

Answers (4)

Albert van der Horst
Albert van der Horst

Reputation: 943

If you are to write an assembler that just works, and spits out a hex file to be loaded on a microcontroller, it can be simple and easy. Part of my ciforth library is a full Pentium assembler to add inline definitions, of about 150 lines. There is an assembler for the 8080 of a couple dozen lines.

The principle is explained http://home.hccnet.nl/a.w.m.van.der.horst/postitfixup.html . It amounts to applying the blackboard design pattern to the problem. You start with laying down the instruction, leaving holes for any and all operands. Then you fill in the holes, when you encounter the parameters.
There is a strict separation between the generic tool and the instruction set.

In case the assembler you need is just for yourself, and there are no requirements than usability (not a homework assignment), you can have an example implementation in http://home.hccnet.nl/a.w.m.van.der.horst/forthassembler.html. If you dislike Forth, there is also an example implementation in Perl. If the Pentium instruction set is too much too chew, then still you must be able to understand the principle and the generic part. You're advised to have a look at the asi8080.frt file first. This is 389 WOC (Words Of Code, not Lines Of Code). An experienced Forther familiar with the instruction set can crank out an assembler like that in an evening. The Pentium is a bitch.

Upvotes: 1

Hernán
Hernán

Reputation: 4587

Look at this Assembler Development Kit from Randy Hyde's author of the famous "The Art of Assembly Language":

The Assembler Developer's Kit

Upvotes: 5

plinth
plinth

Reputation: 49189

I've written three or four simple assemblers. Without using a parser generator, what I did was model the S-C assembler that I knew best for 6502.

To do this, I used a simple syntax - a line was one of the following:

nothing
[label] [instruction] [comment]
[label] [directive] [comment]

A label was one letter followed by any number of letters or numbers.

An instruction was <whitespace><mnemonic> [operands]

A directive was <whitespace>.XX [operands]

A comment was a * up to end of line.

Operands depended on the instruction and the directive.

Directives included .EQ equate for defining constants

.OR set origin address of code

.HS hex string of bytes

.AS ascii string of bytes - any delimiter except white space - whatever started it ended it

.TF target file for output

.BS n reserve block storage of n bytes

When I wrote it, I wrote simple parsers for each component. Whenever I encountered a label, I put it in a table with its target address. Whenever I encountered a label I didn't know, I marked the instruction as incomplete and put the unknown label with a reference to the instruction that needed fixing.

After all source lines had passed, I looked through the "to fix" table and tried to find an entry in the symbol table, if I did, I patched the instructions. If not, then it was an error.

I kept a table of instruction names and all the valid addressing modes for operands. When I got an instruction, I tried to parse each addressing mode in turn until something worked.

Given this structure, it should take a day maybe two to do the whole thing.

Upvotes: 14

ConcernedOfTunbridgeWells
ConcernedOfTunbridgeWells

Reputation: 66622

The first pass of a two-pass assembler assembles the code and puts placeholders for the symbols (as you don't know how big everything is until you've run the assembler). The second pass fills in the addresses. If the assembled code subsequently needs to be linked to external references, this is the job of the eponymous linker.

Upvotes: 2

Related Questions