Reputation: 406
Im trying to write very simple HTML parser with ANTLR and Im facing problem, that ~ rule which should match all until specified character is not working.
My lexer grammar:
lexer grammar HtmlParserLexer;
HTML: OHTML PCDATA CHTML;
PCDATA :(~'<') ; //match all until <
OHTML: '<html>';
CHTML: '</html>';
Im trying to match:
<html>foo bar</html>
Error from Eclipse ANTLR plugin Interpreter:
MismatchedTokenException: line 1:7 mismatched input UNKNOW expecting '<'
Which means, that my grammar ignore PCDATA rule and I dont know why. Thanks in advance for your help.
Upvotes: 1
Views: 147
Reputation: 170257
The rule PCDATA :(~'<') ;
matches a single character other than '<'
. You'll need to repeat it once or more: PCDATA :(~'<')+ ;
(notice the +
).
You may also want to allow <html></html>
(nothing in between<html>
and </html>
). In that case, you shouldn't change PCDATA :(~'<')+ ;
into PCDATA :(~'<')* ;
, but do this instead:
HTML: OHTML PCDATA? CHTML;
PCDATA : (~'<')+ ;
because you shouldn't create lexer rules that could potentially match an empty string.
Upvotes: 3