Reputation: 31
I am creating a lexical analyser that must read a text input and output tokens for a basic 'created' language and should output a token when called. I would like it to distinguish between identifiers, constants etc.. from a list of which I pre-determine.
I need to read the text file using an input stream. A while loop will loop through chars individually but I need it to recognise if the chars scanned are an identifier or a '+' '-' '*' '/' etc... what would be the best way to do this?
I am fairly new to programming so any advice on how to construct this would be appreciated. many thanks for any answers
Upvotes: 3
Views: 3569
Reputation: 3189
Do not try to write your own lexer / parser.
It is easier to use a lexer/parser generator like ANTLR or SableCC.
Upvotes: 4
Reputation:
The StreamTokenizer
class will probably help you out the most. It will read and distinguish between identifiers, numbers, and strings. You can also configure it to identify operators, such as +
, *
, etc.
Upvotes: 2