Reputation: 1682
I've written a program that reads in a Java file including comments and outputs the file without comments.
I consider both line comments //
and block comments /* */
. However, I only use files that don't contain these four characters in any other way: no string literals and no Unicode escape sequences. It only works for files that use these characters exclusively for comments. Can this programme be called a parser? The grammar (either //
and then something or /* and then something and then */
) is regular, right?
I am really only using switch case statements, i.e. implementing a finite state machine. There's no tree built and no stack. I thought that a program is only a parser when it deals with context free languages and at least has a stack, i.e. implements a pushdown automaton. But I have the feeling that the term parser is used rather freely.
To clarify: I'm not looking for ways to get this programme to work with any Java file, I'm just interested in the correct terminology.
Upvotes: 0
Views: 148
Reputation: 3740
No, removal of comments from a Java code involves only a regular expression (a finite state automaton) and can't be called a "parser".... A DFA (deterministic finite automaton) is an important component in a programming language compiler because some pre-processing such as comment removal, identifier (variable/function/class names) identification can be done with DFAs. In fact, compiler developers widely make use of the lex tool (a DFA generator) to implement programming language specific DFAs, e.g. the DFA for comment identification in C and C++ are different.
The next step is to generate intermediate code for a given high level code. For that one has to make use of context-free grammars. It is common to use a shift-reduce parser to build up an annotated parse tree for the code. The most common tool used for this task is the yacc.
Upvotes: 2