Reputation:
I'm interested in compilers, and writing a basic one in C++ simply for the experience! I understand how a compiler parses source code, and then creates a token tree. What I don't understand is how that tree is evaluated, and then if necessary returns a value. For example if there is a statement (a + b), would I have a function to handle the + token, which would be passed a and b? Would it follow that I would do the same with comparison operations, and then even if statements?
Upvotes: 0
Views: 109
Reputation: 370202
Compilers don't evaluate the AST, that's what (naive) interpreters do. Compilers generate code from the AST.
Evaluating a simple int-only AST might look something like this, assuming that an AST node consists of an enum telling the type of node and an array of child nodes:
int eval_expression(node) {
switch(node.type) {
case ADD:
return eval_expression(node.children[0]) + eval_expression(node.children[1]);
case IF:
if(eval_expression(node.children[0])) {
return eval_expression(node.children[1]);
} else {
return eval_expression(node.children[2]);
}
// and so on
}
}
Depending on your language and how you represent the AST, this might look a lot different, but hopefully this gives you an idea.
What a (very simple) compiler might do, would look more like this:
void compile_expression(Node node, const char* target_register) {
switch(node.type) {
case ADD:
const char* temp_register = find_unused_register();
compile_expression(node.children[0], target_register);
compile_expression(node.children[1], temp_register);
printf("ADD %s %s\n", target_register, temp_register);
free_register(temp_register);
case IF:
const char* condition_register = find_unused_register();
compile_expression(node.children[0], condition_register);
const char* elseLabel = generate_label();
const char* labelAfterIf = generate_label();
// If the condition was zero, jump to the else case
printf("JZ %s %s\n", condition_register, elseLabel);
compile_expression(node.children[1], target_register);
printf("JUMP %s\n", labelAfterIf);
printf("%s:\n", elseLabel);
compile_expression(node.children[2], target_register);
printf("%s:\n", labelAfterIf);
free_register(temp_register);
// and so on
}
}
The above code writes assembly code directly to stdout, which isn't quite what a real compiler would do. It's also full of bad engineering practices and assumes a rather simplified assembly dialect as the target. Hopefully it gets the idea across though.
Note that real-world compilers won't generate assembly (or machine code) directly from the AST, nor will interpreters directly evaluate the AST. Instead both will first generate some form of intermediate code (like three-address code) from the AST and then work further with that.
Upvotes: 1