Separation of tasks between lexer and parser in parsing regular expressions

Question

I am somewhat confused about the separation of tasks between a lexer and a parser.

I'm trying to write a parser which takes Perl-style regular expression and builds a syntax tree. My problem is recognizing quantifiers such as {n,m}, which means that the preceeding group or character or character class should occur at least n, but not more than m times.

The point is that a incomplete/invalid quantifier such as {2,5asdf} is not a quantifier, but a group of regular characters.

The question is: Given the input /a{2,5}/, should a lexer return a list of tokes such as DELIMITER CHARACTER QUANTIFIER_START NUMBER COMMA NUMBER QUANTIFIER_END DELIMITER END (the problem being that the QUANTIFIER_START may not be a "real" start of a quantifier, depending on what follows), or should it try to match the complete quantifier and just return QUANTIFIER, which intuitively sounds more like a task for a parser?

Separation of tasks between lexer and parser in parsing regular expressions

Answers (1)

Related Questions