Reputation: 2425
I already have a DSL and would like to build ANTLR4 grammar for it.
Here is an exaple of that DSL:
rule isC {
true when O_M in [5, 6, 17, 34]
false in other cases
}
rule isContract {
true when O_C in ['XX','XY','YY']
false in other cases
}
rule isFixed {
true when F3 ==~ '.*/.*/.*-F.*/.*'
false in other cases
}
rule temp[1].future {
false when O_OF in ['C','P']
true in other cases
}
rule temp[0].scale {
10 when O_M == 5 && O_C in ['YX']
1 in other cases
}
How the DSL is parsed simply by using regular expressions that have became a total mess - so a grammar is needed.
The way it works is the following: it extracts left (before when
) and right parts and they're evaluated by Groovy.
I would still like to have it evaluated by Groovy, but organize the parsing process by using grammar. So, in essence, what I need is to extract these left and right parts using some kind of wildcards.
I unfortunatelly cannot figure out how to do that. Here is what I have so far:
grammar RuleDSL;
rules: basic_rule+ EOF;
basic_rule: 'rule' rule_name '{' condition_expr+ '}';
name: CHAR+;
list_index: '[' DIGIT+ ']';
name_expr: name list_index*;
rule_name: name_expr ('.' name_expr)*;
condition_expr: when_condition_expr | otherwise_condition_expr;
condition: .*?;
result: .*?;
when_condition_expr: result WHEN condition;
otherwise_condition_expr: result IN_OTHER_CASES;
WHEN: 'when';
IN_OTHER_CASES: 'in other cases';
DIGIT: '0'..'9';
CHAR: 'a'..'z' | 'A'..'Z';
SYMBOL: '?' | '!' | '&' | '.' | ',' | '(' | ')' | '[' | ']' | '\\' | '/' | '%'
| '*' | '-' | '+' | '=' | '<' | '>' | '_' | '|' | '"' | '\'' | '~';
// Whitespace and comments
WS: [ \t\r\n\u000C]+ -> skip;
COMMENT: '/*' .*? '*/' -> skip;
This grammar is "too" greedy, and only one rule is processed. I mean, if I listen to parsing with
@Override
public void enterBasic_rule(Basic_ruleContext ctx) {
System.out.println("ENTERING RULE");
}
@Override
public void exitBasic_rule(Basic_ruleContext ctx) {
System.out.println(ctx.getText());
System.out.println("LEAVING RULE");
}
I have the following as output
ENTERING RULE
-- tons of text
LEAVING RULE
How I can make it less greedy, so if I parse this given input, I'll get 5 rules? The greediness comes from condition
and result
I suppose.
UPDATE: It turned out that skipping whitespaces wasn't the best idea, so after a while I ended up with the following: link to gist
Thanks 280Z28 for the hint!
Upvotes: 0
Views: 236
Reputation: 99859
Instead of using .*?
in your parser rules, try using ~'}'*
to ensure that those rules won't try to read past the end of the rule.
Also, you skip whitespace in your lexer but use CHAR+
and DIGIT+
in your parser rules. This means the following are equivalent:
rule temp[1].future
rule t e m p [ 1 ] . f u t u r e
Beyond that, you made in other cases
a single token instead of 3, so the following are not equivalent:
true in other cases
true in other cases
You should probably start by making the following lexer rules, and then making the CHAR
and DIGIT
rules fragment
rules:
ID : CHAR+;
INT : DIGIT+;
Upvotes: 2