Reputation: 323
I am interested in finding out how to compile/create a very simple language (i.e. brainfuck) without the help of high-level functions like unix system calls. I would like to write a compiler for the language in some CPU-dependent low-level assembly such that I can provide source code in the simple language and end up with a binary. Not sure if this is clear or not, but the basic question is how to convert source to binary without the help of anything that is not already present in the hardware.
Edit: A more concise problem statement...
GIVEN:
-Hardware (motherboard/CPU, etc.)
NOT GIVEN:
-UNIX/DOS
-C/FORTRAN/any other languages
How would I go about implementing a simple language like brainfuck?
I am aware that there are much more practical ways of compiling, but I am interested in this for educational purposes.
Sorry if this question is redundant or obvious - I am not a computer scientist so perhaps I just don't know the proper vocabulary to find a solution to the problem online. If anyone can provide a link or text on the subject it would be appreciated.
Upvotes: 2
Views: 729
Reputation: 195
Actually, I have a similar project on my mind too. What you want is to write a compiler that runs on bare hardware (without any operating systems). A compiler is just a program like every other program except that it will be running on bare hardware as in your case.
You might consider using a bootloader for your compiler (a bootloader is not an operating system). The bootloader normally loads an operating system from the disk into memory for execution only this time it loads your compiler rather than an operating system.
There are many free and open-source bootloaders out there (google for them). Choose one that suits your needs or if you're as curious/inquisitive as I am, you might even write your own.
Choosing a bootloader (or writing one) requires that you know how your target system boots - its either through BIOS (old) or UEFI (new. It is the replacement for BIOS). As your programming language matures, you could write your very own bootloader in your new programming language.
Hope this helps
Upvotes: 0
Reputation: 62066
You can compile brainfuck source code into DOS .COM apps quite easily (you'll also need NASM or some extra code to emit instruction opcodes and calculate jumps). Below is a slightly modified bf interpreter, turned into a compiler of sorts:
// file: bfcompil.c
#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_CODE_SIZE 30000
char code[MAX_CODE_SIZE];
char* pc = &code[0];
char* pcEnd = &code[0];
#define MAX_DATA_SIZE 30000
char data[MAX_DATA_SIZE] = { 0 };
char* pd = &data[0];
// Structures for quick bracket matching
unsigned brStack[MAX_CODE_SIZE];
unsigned brSptr = 0;
unsigned brMatch[MAX_CODE_SIZE];
int main(int argc, char** argv)
{
FILE* f = NULL;
int ch;
if (argc != 2)
{
fprintf(stderr, "usage:\n bfcompil <brainfuck-source-code-file>\n"
"bfcompil will output NASM-compilable source code for"
"a DOS program\n");
return EXIT_FAILURE;
}
if ((f = fopen(argv[1], "rb")) == NULL)
{
fprintf(stderr, "can't open file \"%s\" for reading\n", argv[1]);
return EXIT_FAILURE;
}
while ((ch = getc(f)) != EOF)
{
if (strchr(" \t\r\n", ch) != NULL) // skip white space
{
continue;
}
else if (strchr("><+-.,[]", ch) != NULL) // store valid commands
{
if (pcEnd >= &code[sizeof(code)])
{
fprintf(stderr, "too many commands in file \"%s\", expected at most "
"%u commands\n", argv[1], (unsigned)sizeof(code));
fclose(f);
return EXIT_FAILURE;
}
if (ch == '[')
{
brStack[brSptr++] = (unsigned)(pcEnd - &code[0]);
}
else if (ch == ']')
{
if (brSptr == 0)
{
fprintf(stderr, "unmatched ']' in file \"%s\"\n", argv[1]);
fclose(f);
return EXIT_FAILURE;
}
brSptr--;
brMatch[brStack[brSptr]] = (unsigned)(pcEnd - &code[0]);
brMatch[pcEnd - &code[0]] = brStack[brSptr];
}
*pcEnd++ = ch;
}
else // fail on invalid commands
{
fprintf(stderr, "unexpected character '%c' in file \"%s\", valid command "
"set is: \"><+-.,[]\"\n", ch, argv[1]);
fclose(f);
return EXIT_FAILURE;
}
}
fclose(f);
if (brSptr != 0)
{
fprintf(stderr, "unmatched '[' in file \"%s\"\n", argv[1]);
return EXIT_FAILURE;
}
if (pcEnd == &code[0])
{
fprintf(stderr, "no commands found in file \"%s\"\n", argv[1]);
return EXIT_FAILURE;
}
printf("; how to compile: nasm -f bin <input file with this code.asm> -o "
"<output executable.com>\n\n"
"org 0x100\n"
"bits 16\n\n"
" mov bx, data\n"
" mov di, bx\n"
" mov cx, 30000\n"
" xor al, al\n"
" cld\n"
" rep stosb\n\n"
" jmp code\n\n"
"print:\n"
" mov ah, 2\n"
" cmp byte [bx], 10\n"
" jne lprint1\n"
" mov dl, 13\n"
" int 0x21\n"
"lprint1:\n"
" mov dl, [bx]\n"
" int 0x21\n"
" ret\n\n"
#if 01
// buffered input
"input:\n"
" cmp byte [kbdbuf+1], 0\n"
" jne linput1\n"
" mov ah, 0xa\n"
" mov dx, kbdbuf\n"
" int 0x21\n"
" inc byte [kbdbuf+1]\n"
"linput1:\n"
" mov al, [kbdbuf+2]\n"
" cmp al, 13\n"
" jne linput4\n"
" mov al, 10\n"
"linput4:\n"
" mov [bx], al\n"
" mov si, kbdbuf+3\n"
" mov di, kbdbuf+2\n"
" xor cx, cx\n"
" dec byte [kbdbuf+1]\n"
" mov cl, [kbdbuf+1]\n"
" jz linput3\n"
"linput2:\n"
" lodsb\n"
" stosb\n"
" loop linput2\n"
"linput3:\n"
" ret\n\n"
#else
// unbuffered input
"input:\n"
" mov ah, 1\n"
" int 0x21\n"
" cmp al, 13\n"
" jne linput\n"
" mov al, 10\n"
"linput:\n"
" mov [bx], al\n"
" ret\n\n"
#endif
"code:\n\n");
for (pc = &code[0]; pc < pcEnd; pc++)
{
switch (*pc)
{
case '>':
printf(" inc bx\n");
break;
case '<':
printf(" dec bx\n");
break;
case '+':
printf(" inc byte [bx]\n");
break;
case '-':
printf(" dec byte [bx]\n");
break;
case '.':
printf(" call print\n");
break;
case ',':
printf(" call input\n");
break;
case '[':
printf("label%u:\n", (unsigned)(pc - &code[0]));
printf(" cmp byte [bx], 0\n");
printf(" je label%u\n", (unsigned)brMatch[pc - &code[0]]);
break;
case ']':
printf(" jmp label%u\n", brMatch[pc - &code[0]]);
printf("label%u:\n", (unsigned)(pc - &code[0]));
break;
}
}
printf("\n ret\n\n");
printf("kbdbuf:\n"
" db 254\n"
" db 0\n"
" times 256 db 0\n\n");
printf("data:\n");
return EXIT_SUCCESS;
}
If you feed it the hello world program:
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
It'll produce compilable assembly code:
; how to compile: nasm -f bin <input file with this code.asm> -o <output executable.com>
org 0x100
bits 16
mov bx, data
mov di, bx
mov cx, 30000
xor al, al
cld
rep stosb
jmp code
print:
mov ah, 2
cmp byte [bx], 10
jne lprint1
mov dl, 13
int 0x21
lprint1:
mov dl, [bx]
int 0x21
ret
input:
cmp byte [kbdbuf+1], 0
jne linput1
mov ah, 0xa
mov dx, kbdbuf
int 0x21
inc byte [kbdbuf+1]
linput1:
mov al, [kbdbuf+2]
cmp al, 13
jne linput4
mov al, 10
linput4:
mov [bx], al
mov si, kbdbuf+3
mov di, kbdbuf+2
xor cx, cx
dec byte [kbdbuf+1]
mov cl, [kbdbuf+1]
jz linput3
linput2:
lodsb
stosb
loop linput2
linput3:
ret
code:
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
label10:
cmp byte [bx], 0
je label41
inc bx
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc bx
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc bx
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc bx
inc byte [bx]
dec bx
dec bx
dec bx
dec bx
dec byte [bx]
jmp label10
label41:
inc bx
inc byte [bx]
inc byte [bx]
call print
inc bx
inc byte [bx]
call print
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
call print
call print
inc byte [bx]
inc byte [bx]
inc byte [bx]
call print
inc bx
inc byte [bx]
inc byte [bx]
call print
dec bx
dec bx
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
inc byte [bx]
call print
inc bx
call print
inc byte [bx]
inc byte [bx]
inc byte [bx]
call print
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
call print
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
dec byte [bx]
call print
inc bx
inc byte [bx]
call print
inc bx
call print
ret
kbdbuf:
db 254
db 0
times 256 db 0
data:
If you compile it, you'll be able to run it in DOS, Windows 9x/XP (probably 32-bit Vista/7) and in DosBox.
The output, unsurprisingly, is:
Hello World!
UPDATE: The DOS-based input and output routines in the above code can be replaced by direct accesses to the screen buffer and keyboard ports. The keyboard code will also need to handle keyboard interrupts. It's not very hard to do on the x86 PC. You really can implement a compiler for a language to run on bare hardware without OS.
You should also take a look at Forth
as that's exactly the kind of language for the given environment. And it's easy to implement. Much easier than C. Harder than brainfuck, somewhat comparable with assembly.
UPDATE 2: Here's a small (~1KB in size) brainfuck interpreter that's not using any DOS or BIOS functionality:
; file: bfint.asm
; compile: nasm.exe -f bin bfint.asm -o bfint.com
; run in: DOS, DosBox or equivalent
bits 16
org 0x100
section .text
SCREEN_WIDTH equ 80
SCREEN_HEIGHT equ 25
SCAN_BUF_SIZE equ 256
MAX_CODE_SIZE equ 20000
MAX_DATA_SIZE equ 30000
cld
; set new keyboard (IRQ1) ISR
push byte 0
pop es
cli ; update ISR address w/ ints disabled
mov word [es:9*4], Irq1Isr
mov [es:9*4+2], cs
sti
push cs
pop es
Restart:
call ClearScreen
mov si, MsgHello
call PrintStr
mov word [CodeSize], 0
mov byte [EnterCount], 0
WaitForKey:
call GetKey
; Escape erases code
cmp ah, 1 ; Escape
je Restart
; Non-characters are ignored
cmp al, 0 ; non-character key
je WaitForKey
; Enter is "printed" but not stored, use for formatting
cmp al, 10 ; Enter
je KeyEnter
mov byte [EnterCount], 0
; Backspace deletes last character
cmp al, 8 ; Backspace
je KeyBackspace
; Space is printed but not stored, use for formatting
cmp al, " " ; Space
je PrintOnly
; 0 runs a test program
cmp al, "0"
je TestProgram
; Other chracters are stored as code
mov bx, [CodeSize]
cmp bx, MAX_CODE_SIZE
jae ErrCodeTooBig
mov [Code + bx], al
inc word [CodeSize]
PrintOnly:
call PrintChar
jmp WaitForKey
ErrCodeTooBig:
mov si, MsgCodeTooBig
call PrintStr
mov word [CodeSize], 0
jmp WaitForKey
KeyEnter:
call PrintChar
inc byte [EnterCount]
cmp byte [EnterCount], 1
je WaitForKey
mov byte [EnterCount], 0
call Execute
jmp WaitForKey
KeyBackspace:
call PrintChar
cmp word [CodeSize], 0
je WaitForKey
dec word [CodeSize]
jmp WaitForKey
TestProgram:
mov si, TestCode
mov di, Code
mov cx, TestCodeEnd - TestCode
mov [CodeSize], cx
rep movsb
call Execute
jmp WaitForKey
Execute:
mov si, Code ; code start
xor bp, bp ; instruction index
mov di, Data ; data start
mov cx, MAX_DATA_SIZE
xor al, al
rep stosb
sub di, MAX_DATA_SIZE
xor bx, bx ; data index
ExecuteLoop:
cmp bp, [CodeSize]
jae ExecuteDone
mov al, [bp+si]
cmp al, ">"
je IncPtr
cmp al, "<"
je DecPtr
cmp al, "+"
je IncData
cmp al, "-"
je DecData
cmp al, "."
je PrintData
cmp al, ","
je InputData
cmp al, "["
je While
cmp al, "]"
je EndWhile
mov si, MsgInvalidChar
call PrintStr
call PrintChar
mov al, 10
call PrintChar
jmp ExecuteDone
IncPtr:
inc bx
jmp ExecuteContinue
DecPtr:
dec bx
jmp ExecuteContinue
IncData:
inc byte [bx+di]
jmp ExecuteContinue
DecData:
dec byte [bx+di]
jmp ExecuteContinue
PrintData:
mov al, [bx+di]
call PrintChar
jmp ExecuteContinue
InputData:
call GetKey
or al, al
jz InputData
mov [bx+di], al
jmp ExecuteContinue
While:
cmp byte [bx+di], 0
jne ExecuteContinue
mov ax, 1
mov dx, "[]"
call FindMatchingBracket
ExecuteContinue:
inc bp
jmp ExecuteLoop
EndWhile:
mov ax, -1
mov dx, "]["
call FindMatchingBracket
jmp ExecuteLoop
ExecuteDone:
mov word [CodeSize], 0
mov si, MsgCompleted
jmp PrintStr
FindMatchingBracket:
xor cx, cx
FindMatchingBracket1:
cmp byte [bp+si], dl
jne FindMatchingBracket2
inc cx
jmp FindMatchingBracket3
FindMatchingBracket2:
cmp byte [bp+si], dh
jne FindMatchingBracket3
dec cx
jnz FindMatchingBracket3
ret
FindMatchingBracket3:
add bp, ax
jmp FindMatchingBracket1
; Inputs:
; AL = ASCII character code
PrintChar:
; assuming it's a color text mode (not monochrome or graphics)
pusha
push es
push word 0xb800
pop es
mov bx, [CursorPos]
cmp al, 8
je PrintCharBackSpace
cmp al, 10
je PrintCharBackLF
cmp al, 13
je PrintCharBackCR
mov [es:bx], al
call AdvanceCursorPosition
jmp PrintCharDone
PrintCharBackSpace:
; move the cursor back and erase the last character
or bx, bx
jz PrintCharDone
dec bx
dec bx
mov word [es:bx], 0x0720
jmp PrintCharSetCursorPos
PrintCharBackLF:
; move the cursor to the beginning of the next line - '\n' behavior
add bx, SCREEN_WIDTH * 2
cmp bx, SCREEN_WIDTH * SCREEN_HEIGHT * 2
jc PrintCharBackCR
sub bx, SCREEN_WIDTH * 2
call ScrollUp
PrintCharBackCR:
; move the cursor to the beginning of the current line - '\r' behavior
mov ax, SCREEN_WIDTH * 2
xchg ax, bx
xor dx, dx
div bx
mul bx
mov bx, ax
PrintCharSetCursorPos:
mov [CursorPos], bx
shr bx, 1
call SetCursorPosition
PrintCharDone:
PopEsAllRet:
pop es
popa
ret
ClearScreen:
; assuming it's a color text mode (not monochrome or graphics)
pusha
push es
push word 0xb800
pop es
xor di, di
mov cx, SCREEN_WIDTH * SCREEN_HEIGHT
mov ax, 0x0720 ; character = space, color = lightgray on black
rep stosw
xor bx, bx
mov [CursorPos], bx
call SetCursorPosition
jmp PopEsAllRet
ScrollUp:
; assuming it's a color text mode (not monochrome or graphics)
pusha
push es
push ds
push word 0xb800
pop es
push es
pop ds
mov si, SCREEN_WIDTH * 2
xor di, di
mov cx, SCREEN_WIDTH * (SCREEN_HEIGHT - 1)
rep movsw
mov cx, SCREEN_WIDTH
mov ax, 0x0720 ; character = space, color = lightgray on black
rep stosw
pop ds
jmp PopEsAllRet
; Inputs:
; DS:SI = address of NUL-terminated ASCII string
PrintStr:
pusha
PrintStr1:
lodsb
or al, al
jz PrintStrDone
call PrintChar
jmp PrintStr1
PrintStrDone:
popa
ret
; Inputs:
; BX = Y * SCREEN_WIDTH + X
SetCursorPosition:
; assuming it's a color text mode (not monochrome or graphics)
pusha
%if 0
mov dx, 0x3d4
mov al, 0x0f
out dx, al
inc dx
mov al, bl
out dx, al
dec dx
mov al, 0x0e
out dx, al
inc dx
mov al, bh
out dx, al
%else
mov dx, 0x3d4
mov al, 0x0f
mov ah, bl
out dx, ax
dec al
mov ah, bh
out dx, ax
%endif
popa
ret
AdvanceCursorPosition:
; assuming it's a color text mode (not monochrome or graphics)
pusha
mov ax, [CursorPos]
inc ax
inc ax
cmp ax, SCREEN_WIDTH * SCREEN_HEIGHT * 2
jc AdvanceCursorPosition1
sub ax, SCREEN_WIDTH * 2
call ScrollUp
AdvanceCursorPosition1:
mov [CursorPos], ax
shr ax, 1
xchg ax, bx
call SetCursorPosition
popa
ret
; Outputs:
; AH = scan code
; AL = character
GetKey:
push bx
push si
GetKeyRepeat:
mov ax, [ScanWriteIdx]
mov si, [ScanReadIdx]
sub ax, si
jz GetKeyRepeat
mov bx, si
mov ax, [ScanBuf + bx + si]
inc si
and si, SCAN_BUF_SIZE - 1
mov [ScanReadIdx], si
pop si
pop bx
ret
Irq1Isr:
pusha
push ds
push cs
pop ds
; read keyboard scan code
in al, 0x60
cmp al, 0x2a ; Left Shift down
jne Irq1Isr1
or byte [Shift], 1
Irq1Isr1:
cmp al, 0x36 ; Right Shift down
jne Irq1Isr2
or byte [Shift], 2
Irq1Isr2:
cmp al, 0xaa ; Left Shift up
jne Irq1Isr3
and byte [Shift], ~1
Irq1Isr3:
cmp al, 0xb6 ; Right Shift up
jne Irq1Isr4
and byte [Shift], ~2
Irq1Isr4:
test al, 0x80
jnz Irq1IsrEois ; key released
mov ah, al
cmp al, 58
jc Irq1Isr5
xor al, al ; don't translate non-character keys
jmp Irq1Isr7
Irq1Isr5:
mov bx, ScanToChar
cmp byte [Shift], 0
je Irq1Isr6
add bx, ScanToCharShift - ScanToChar
Irq1Isr6:
xlatb
Irq1Isr7:
mov bx, [ScanWriteIdx]
mov di, bx
mov [ScanBuf + bx + di], ax
inc bx
and bx, SCAN_BUF_SIZE - 1
mov [ScanWriteIdx], bx
Irq1IsrEois:
%if 0
; send EOI to XT keyboard
in al, 0x61
mov ah, al
or al, 0x80
out 0x61, al
mov al, ah
out 0x61, al
%endif
; send EOI to master PIC
mov al, 0x20
out 0x20, al
pop ds
popa
iret
ScanToChar:
db 0 ; unused
db 0 ; Escape
db "1234567890-="
db 8 ; Backspace
db 9 ; Tab
db "qwertyuiop[]"
db 10 ; Enter
db 0 ; Ctrl
db "asdfghjkl;'`"
db 0 ; Left Shift
db "\zxcvbnm,./"
db 0 ; Right Shift
db 0 ; Print Screen
db 0 ; Alt
db " " ; Space
ScanToCharShift:
db 0 ; unused
db 0 ; Escape
db "!@#$%^&*()_+"
db 8 ; Backspace
db 9 ; Tab
db "QWERTYUIOP{}"
db 10 ; Enter
db 0 ; Ctrl
db 'ASDFGHJKL:"~'
db 0 ; Left Shift
db "|ZXCVBNM<>?"
db 0 ; Right Shift
db 0 ; Print Screen
db 0 ; Alt
db " " ; Space
MsgHello:
db "Brainfuck Interpreter", 10, 10
db "Press 0 to run test code OR", 10
db "Type your code.", 10
db "Use Esc to erase it all or Backspace to delete last character.", 10
db "Press Enter twice to run it.", 10, 10, 0
MsgCodeTooBig:
db 10, "Code's too big", 10, 0
MsgCompleted:
db 10, "Code's completed", 10, 0
MsgInvalidChar:
db 10, "Invalid character: ", 0
Shift db 0
CursorPos dw 0
ScanReadIdx dw 0
ScanWriteIdx dw 0
EnterCount db 0
CodeSize dw 0
TestCode:
; Hello World!
db "++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>."
; Squares of 0 through 100
; db "++++[>+++++<-]>[<+++++>-]+<+[>[>+>+<<-]++>>[<<+>>-]>>>[-]++>[-]+>>>+[[-]++++++>>>]<<<[[<++++++++<++>>-]+<.<[>----<-]<]<<[>>>>>[>>>[-]+++++++++<[>-<-]+++++++++>[-[<->-]+[<<<]]<[>+<-]>]<<-]<<-]"
; ROT13
; db "+[,+[-[>+>+<<-]>[<+>-]+>>++++++++[<-------->-]<-[<[-]>>>+[<+<+>>-]<[>+<-]<[<++>>>+[<+<->>-]<[>+<-]]>[<]<]>>[-]<<<[[-]<[>>+>+<<<-]>>[<<+>>-]>>++++++++[<-------->-]<->>++++[<++++++++>-]<-<[>>>+<<[>+>[-]<<-]>[<+>-]>[<<<<<+>>>>++++[<++++++++>-]>-]<<-<-]>[<<<<[-]>>>>[<<<<->>>>-]]<<++++[<<++++++++>>-]<<-[>>+>+<<<-]>>[<<+>>-]+>>+++++[<----->-]<-[<[-]>>>+[<+<->>-]<[>+<-]<[<++>>>+[<+<+>>-]<[>+<-]]>[<]<]>>[-]<<<[[-]<<[>>+>+<<<-]>>[<<+>>-]+>------------[<[-]>>>+[<+<->>-]<[>+<-]<[<++>>>+[<+<+>>-]<[>+<-]]>[<]<]>>[-]<<<<<------------->>[[-]+++++[<<+++++>>-]<<+>>]<[>++++[<<++++++++>>-]<-]>]<[-]++++++++[<++++++++>-]<+>]<.[-]+>>+<]>[[-]<]<]"
TestCodeEnd:
section .bss
ScanBuf:
resw SCAN_BUF_SIZE
Code:
resb MAX_CODE_SIZE
Data:
resb MAX_DATA_SIZE
If you want to take DOS (as the hosting environment) and NASM out of the picture, you're welcome to encode the above assembly code by hand, make a bootable floppy out of it and boot it.
Upvotes: 1
Reputation: 71536
Looking at the description on wikipedia this is not a difficult task. I would still probably start in some language you know and perhaps like, perhaps not. C is a good choice. File I/O is a small or huge project depending on the platform, etc. Worry about that later, compile in the "source" for the language. for each character in that source, perform the task
> ++ptr;
< --ptr;
+ ++*ptr;
etc
then translate that to assembly. you only need one register to hold ptr, a few lines of asm to initialize the array/ram and set the register/ptr to the beginning. Another register to walk through the source code. you are only looking for 8 characters, you could if-then-else your way through those unless there is a bit pattern thing that makes them easier to deal with. if you want you could make a 256byte look up table and use that either for the address to the handler for that instruction or to convert them into a integer from 0-7, and use that in a jump table, whatever.
Well that is a parser, not necessarily a compiler. I would write the compiler in C or some high level language, it takes in an array of bytes which is the program, for each instruction you output the asm source code that implements that instruction, you get a less than byte on input, output (using ARM asm)
add r0,#1
a minus sign
ldr r1,[r0]
sub r1,#1
str r1,[r0]
r0 being the ptr register and r1 just helping out.
If you are really opposed to using calls like printf, then make the output of this code an array of bytes which are the ascii for the asm source out put each character a, d, d, space, r, 0, comma, #, 1, cr, lf and so on. Fairly easy to implement in asm as well as some high level language. If you want to go straight to binary then simply output machine code, even easier.
Getting the source string into this compiler and the output into some file that can be then later executed is likely going to take system calls. You can avoid the output being a file if you are running on the same platform and can do self-modifying code in the sense that you build up the machine code at some address, and then when finished parsing you jump to that address to execute.
it took many times longer to write this answer than it would have been to implement the solution in C or asm. What is the exact difficulty you are having?
Upvotes: 1
Reputation: 3431
The canonical compiler-learning book is the Dragon Book, http://dragonbook.stanford.edu/ .
However, it really is pointed at more... SOPHISTICATED languages. You might not want to talk about context-free parsing and the like (although I do recommend that book, it's double-plus-hard but AWESOME.)
With that in mind, you might want to start by finding an interpreter or compiler for a really simple language-- maybe Brainfuck itself, maybe something a little more usable like a Scheme implementation. Read, analyze, learn what it does. Implement whichever lower-level library functions that compiler uses, adjust its code generator to output whatever brand of machine code you want to target, and you're done.
Upvotes: 0