Reputation: 1291
I'm designing a interpreter which uses recursive descent and I've got to the point where I'm starting to implement built in methods.
One example of a method I'm implementing is the print()
method which outputs to the console, just like python's print()
method and Java's System.out.println()
.
However it has come to my attention that there are multiple ways of implementing these built in methods. I'm sure there is many more but I have determined 2 viable ways of achieving this and I'm trying to establish which way is the best practise. For context below is the different layers I'm using within my interpreter which is loosely based off of https://www.geeksforgeeks.org/introduction-of-compiler-design/ and other tutorials I've come across.
1. Creating a AST Node for each individual built in method.
This method entails programming the parser to generate a node for each individual method. This means that a unique node will exist for each method. For example:
The parser will look to generate a node when a TPRINT
token is found in the lexer.
print : TPRINT TLPAREN expr TRPAREN {$$ = new Print($3);}
;
And this is what the print class looks like.
class Print : public Node {
public:
virtual VariableValue visit_Semantic(SemanticAnalyzer* analyzer) override;
virtual VariableValue visit_Interpreter(Interpreter* interpreter) override;
Node* value;
Print(Node* cvalue) {
value = cvalue;
}
}
From there I define the visit_Semantic
and visit_interpreter
methods, and visit them using recursion from the top node.
I can think of a couple advantages/disadvantages to using this method:
Advantages
Print
node's visit_interpreter
method, it can straight away execute the response directly as it is programmed into it's visit methods.Disadvantages
2. Creating a Generic AST Node for a method call Node and then use a lookup table to determine which method is being called.
This involves creating one generic node MethodCall
and grammar to determine if a method has been called, with some unique identifier such as a string for the method it is referring to. Then, when the MethodCall
's visit_Interpreter
or visit_Semantic
method is called, it looks up in a table which code to execute.
methcall : TIDENTIFIER TLPAREN call_params TRPAREN {$$ = new MethodCall($1->c_str(), $3);}
;
MethodCall
Node. Here the unique identifier is std::string methodName
:
class MethodCall : public Node {
public:
virtual VariableValue visit_Semantic(SemanticAnalyzer* analyzer) override;
virtual VariableValue visit_Interpreter(Interpreter* interpreter) override;
std::string methodName;
ExprList *params;
MethodCall(std::string cmethodName, ExprList *cparams) {
params = cparams;
methodName = cmethodName;
}
};
Advantages:
Disadvantages:
std::string methodName
will have to be compared in a lookup table to determine a response for it. This is not as efficient as directly programming in a response to a Node's visit methods.Which practise is the best way of handling methods within a compiler/interpreter? Are there different practises all together which are better, or are there any other disadvantages/advantages I am missing?
I'm fairly new to compiler/interpreter design so please ask me to clarify if I have got some terminology wrong.
Upvotes: 3
Views: 142
Reputation: 862
As far as I see it, you have to split things up into methods somewhere. The question is, do you want to implement this as part of the parser definition (solution 1) or do you want to implement this on the C++ side (solution 2).
Personally, I would prefer to keep the parser definition simple and move this logic to the C++ side, that is solution 2.
From a runtime perspective of solution 2, I wouldn't worry too much about that. But in the end, it depends on how frequently that method is called and how many identifiers you have. Having just a few identifiers is different from comparing say hundreds of strings in an "else if" manner.
You could first implement it the simple straight-forward way, i.e. match the string identifiers in an "else if" manner, and see if you run into runtime issues.
If you experience runtime issues, you could use a hash function. The "hardcore" way would be to implement an optimal hash function yourself and check hash function optimality offline, since you know the set of string identifiers. But for your application that would be probably be rather overkill or for academic purposes, and I would recommend to just use unordered_map
from the STL (which uses hashing under the hood, see also How std::unordered_map is implemented) to map your string identifiers to index numbers, so that you can implement your jump table with an efficient switch
operation on those index numbers.
Upvotes: 4
Reputation: 3237
You should most definitely use a table lookup. It makes things much easier for you. Also, think about the functions users define! Then you'll definitely need a table.
Upvotes: 1