Reputation: 59
I am writing a handcrafted lexer and a parser in C++. I have written the lexer in such a way that if it finds for example a ;
it prints "SEMICOLON", if it finds while
it prints "KEYWORD", if it finds hello
it prints "IDENTIFIER" ,etc,etc. However now I need to pass these tokens to a parser. How can this be done for example using a list? And I found that I need to store the token type and token value
Upvotes: 4
Views: 1863
Reputation: 599
Use an std::map
in such a way:
std::map<string, string> my_map = {
{ ";", "SEMICOLON" },
{ "while", "KEYWORD" },
...
};
Upvotes: -1
Reputation: 15277
You are obviously not using the classical approach, where a parser calls a scanner to get the next token. Usually pull parsers are used. Meaning, the Parser pulls tokens from the Scanner (Lexer) by calling a corresponding function. The most common scanner/parser-generators Lex/Yacc or Flex/Bison use this approach. So, the parser calls something like getNextToken and then the scanner reads bytes from the input stream, until it finds a token. It will not return before the token (or an error) has been detected.
There are also push parsers. Here the input stream is read by the parser or something else (e.g. a socket) and then stuffed into the scanner, until a token can be identified, which is then returned. This is a little bit more complicated, because the scanner needs to maintain the state. Latest Bison versions support this method.
Common for both is is the use of a class or struct (POD) "Token". This class contains usually the token type and one ore more attributes, like a value. And many, often overloaded, setters and getters. This is normally the main interface between parser ans scanner.
As far as I understand your approach, you are first running the scanner, consume the whole input and collect all tokens. Also possible. You would then store all tokens (as dscribed above) in a std::vector (or other stl::container). The vector will then be accessed by the parser.
For this communication you could use the mediator pattern or you can embed the container in a "context"-class and exchange this between the scanner and the parser.
You can also add a member function to your scanner class (getToken) which returns one element of your container of tokens. For that you need to maintain state. An iterator for your scanner that would basically call the iterator of the underlying container, would also be a good proposal. With that you can easily iterate over your tokens and implement (maybe) necessary actions like reading a look ahead symbol or "unget" something.
The above should basically answer your questions.
And, for easy grammars this will work. But for more complex grammars I would recommend the classical approach. There may be a necessity for context dependent scanning. E.g. the same keyword may produce a different token. This cannot be handled by your approach.
I would recommend to read about Lex and Yacc, not because you should use it, but to get a deeper understanding. Or, of course, read the Dragon book or something like "Crafting a Compiler with C"
You may also want to look at 2 compiler examples here
Hope I could help a little.
Upvotes: 4