pokeahontas
pokeahontas

Reputation: 37

How do I properly parse Regex in ANTLR

I want to parse this

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

and other variations of course of regular expressions. Does someone know how to do this properly?

Thanks in advance.

Edit: I tried throwing in all regex signs and chars in one lexer rule like this

REGEX: ( DIV | ('i') | ('@') | ('[') | (']') | ('+') | ('.') | ('*') | ('-') | ('\\') | ('(') | (')') |('A') |('w') |('a') |('z') |('Z')
     //|('w')|('a'));

and then make a parser rule like this:

regex_assignment: (REGEX)+

but there are recognition errors(extraneous input). This is definetly because these signs are ofc used in other rules before.

The thing is I actually don't need to process these regex assignments, I just want it to be recognized correctly without errors. Does anyone have an approach for this in ANTLR? For me a solution would suffice, that just recognzies this as regex and skips it for example.

Upvotes: 1

Views: 4391

Answers (2)

Sirmabus
Sirmabus

Reputation: 726

There is regex grammar now (since 2019): https://github.com/antlr/grammars-v4/tree/master/xsd-regex

Upvotes: 1

Mike Lischke
Mike Lischke

Reputation: 53317

Unfortunately, there is no regex grammar yet in the ANTLR grammar repository, but similar questions have come up before, e.g. Regex Grammar. Once you have the (E)BNF you can convert that to ANTLR. Or alternatively, you can use the BNF grammar to check your own grammar rules to see if they are correctly defined. Simply throwing together all possible input chars in a single rule won't work.

Upvotes: 1

Related Questions