Reputation: 2582
I got this parser grammar with which I also want to use something similar to Javascript template-strings.
parser grammar Test;
options {
tokenVocab = TestLexer;
}
definition: sourceElements? EOF ;
sourceElements: sourceElement+ ;
sourceElement: mapping ;
templateString: '`' TemplateStringCharacter* ('${' variable '}' TemplateStringCharacter*)+ '`' ;
fieldName: varname | ('[' value ']') ;
mapping: fieldName ':' ( '{' sourceElements '}'
| variable ( '{' sourceElements '}' )? '?'?
| value
| array )
;
funParameter: '(' value? (',' value)* ')' ;
array: '[' value? (',' value)* ']';
variable: (varname | '{' value '}' | '[' boolEx ']' | templateString) funParameter? ('.' variable)* ;
value: INT | BOOL | FLOAT | STRING | variable ;
varname: VAR ;
And this lexer grammar
lexer grammar TestLexer;
WS : [ \t\r\n\u000C]+ -> skip ;
NEWLINE : [\r\n] ;
BOOL : ('true'|'false') ;
TemplateStringLiteral : TemplateStringCharacter*;
VAR : [$]?[a-zA-Z0-9_]+|[@] ;
INT : '-'?[0-9]+ ;
FLOAT : '-'?[0-9]+'.'[0-9]+ ;
STRING : '"' DoubleStringCharacter* '"' | '\'' SingleStringCharacter* '\'' ;
TEMPSTART : '${' ;
TEMPEND : '}' ;
TemplateStart : '`' -> pushMode(template) ;
/// Comments
MultiLineComment : '/*' .*? '*/' -> channel(HIDDEN) ;
SingleLineComment : '//' ~[\r\n\u2028\u2029]* -> channel(HIDDEN) ;
mode template;
TemplateVariableStart: TEMPSTART -> pushMode(templateVariable);
TemplateStringLiteral : TemplateStringCharacter* ;
TemplateEnd : '`' -> popMode;
mode templateVariable;
WS : [ \t\r\n\u000C]+ -> skip ;
All : [^}]+ ;
TemplateVariableEnd : TEMPEND -> popMode;
fragment DoubleStringCharacter : ~["\r\n] ;
fragment SingleStringCharacter : ~['\r\n] ;
fragment TemplateStringCharacter : ~[`] ;
fragment DecimalDigit : [0-9] ;
When I input this:
test: {
abc: `Hello World`
}
The parsing tree looks like this:
(definition
(sourceElements
(sourceElement
(statement
(mapping
(fieldName
(varname test)
) : {
(sourceElements
(sourceElement
(statement mapping)
)
(sourceElement
(statement
(mapping abc : `)
)
)
(sourceElement
(statement mapping)
)
(sourceElement
(statement
(mapping Hello)
)
)
(sourceElement
(statement
(mapping World `)
)
)
)
}
)
)
)
)
<EOF>
)
And I get the error: line 2:8 no viable alternative at input 'abc:`Hello'
I don't understand, why it is even possible to match something like an empty mapping or a mapping like "World `" because a mapping would need to have a ":" in the middle. And why is the rule templateString not matching the whole "Hello World" from back tick to back tick?
EDIT:
After noticing that the Lexer wasn't regenerated when I thought it was I got errors like: "cannot create implicit token for string literal in non-combined grammar: ']'". So I had to move all implicit declarations to the lexer grammar. So I changed the code to this:
parser grammar Test;
options {
tokenVocab = TestLexer;
}
definition: sourceElements? EOF ;
sourceElements: sourceElement+ ;
sourceElement: mapping ;
templateString: OpenBackTick TemplateStringLiteral* (TemplateVariableStart variable CloseBrace TemplateStringLiteral*)+ CloseBackTick ;
fieldName: varname | OpenBracket value CloseBracket ;
mapping: fieldName Colon (
OpenBrace sourceElements CloseBrace
| variable ( OpenBrace sourceElements CloseBrace )? IF?
| value
| array
)
;
funParameter: OpenParen value? (Comma value)* CloseParen ;
array: OpenBracket value? (Comma value)* CloseBracket;
variable: (varname | OpenBrace value CloseBrace | templateString) funParameter? (Dot variable)* ;
value: INT | BOOL | FLOAT | STRING | variable ;
varname: VAR ;
And lexer grammar:
lexer grammar TestLexer;
OpenBracket: '[';
CloseBracket: ']';
OpenParen: '(';
CloseParen: ')';
OpenBrace: '{' ;
CloseBrace: '}' ;
IF: '?' ;
AND: 'AND' ;
OR: 'OR';
LessThan: '<';
MoreThan: '>';
LessThanEquals: '<=';
GreaterThanEquals: '>=';
Equals: '=';
NotEquals: '!=';
IN: 'IN';
NOT: '!';
Colon: ':';
Dot: '.' ;
Comma: ',' ;
OpenBackTick : '`' -> pushMode(template) ;
WS : [ \t\r\n\u000C]+ -> skip ;
NEWLINE : [\r\n] ;
BOOL : ('true'|'false') ;
VAR : [$]?[a-zA-Z0-9_]+|[@] ;
INT : '-'?[0-9]+ ;
FLOAT : '-'?[0-9]+'.'[0-9]+ ;
STRING : '"' DoubleStringCharacter* '"' | '\'' SingleStringCharacter* '\'' ;
/// Comments
MultiLineComment : '/*' .*? '*/' -> channel(HIDDEN) ;
SingleLineComment : '//' ~[\r\n\u2028\u2029]* -> channel(HIDDEN) ;
mode template;
TemplateVariableStart: '${' -> pushMode(templateVariable);
CloseBackTick : '`' -> popMode;
TemplateStringLiteral: TemplateStringCharacter ;
mode templateVariable;
WHS : [ \t\r\n\u000C]+ -> skip ;
All : [^}]+ ;
TemplateVariableEnd : CloseBrace -> popMode;
fragment DoubleStringCharacter : ~["\r\n] ;
fragment SingleStringCharacter : ~['\r\n] ;
fragment TemplateStringCharacter : ~[`] ;
fragment DecimalDigit : [0-9] ;
Now I get the error: line 1:0 mismatched input 'test' expecting {, '?', '[', VAR} Which is strange, cause 'test' should be matched by VAR. Any ideas why this is happening?
Upvotes: 0
Views: 425
Reputation: 370327
There are two lexer rules in your default mode that can match a backtick: BTICK
and TemplateStart
. TemplateStart
will switch to the template
mode, but BTICK
will not. Since BTICK
comes first in your grammar, so it takes precedence. That means when the lexer sees a backtick, it will generate a BTICK
token and not switch modes.
To fix this you should have only one lexer rule per mode that matches a backtick and that rule should change the mode.
I don't understand, why it is even possible to match something like an empty mapping or a mapping like "World `" because a mapping would need to have a ":" in the middle.
When your input contains a syntax error, the generated parse tree can contain constructs that aren't actually valid either. When your input parses without errors, you'll get a tree that makes sense.
Upvotes: 1