Reputation: 13
So I have take inspiration from the DOT.g4 grammar in this github repository grammars-v4/dot/DOT.g4. Tht's why I have as well a DOT file to parse.
This is a possible structure of my DOT file:
digraph G {
rankdir=LR
label="\n[Büchi]"
labelloc="t"
node [shape="circle"]
I [label="", style=invis, width=0]
I -> 34
0 [label="0", peripheries=2]
0 -> 0 [label="!v_0"]
1 [label="1", peripheries=2]
1 -> 1 [label="!v_2 & !v_5"]
2 [label="2"]
2 -> 1 [label="v_0 & v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
3 [label="3"]
3 -> 1 [label="v_0 & v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
4 [label="4"]
4 -> 1 [label="v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
5 [label="5"]
5 -> 1 [label="v_0 & v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
}
And Here my grammar.g4 file that I have modified from the link above:
parse: nba| EOF;
nba: STRICT? ( GRAPH | DIGRAPH ) ( initialId? ) '{' stmtList '}';
stmtList : ( stmt ';'? )* ;
stmt: nodeStmt| edgeStmt| attrStmt | initialId '=' initialId;
attrStmt: ( GRAPH | NODE | EDGE ) '[' a_list? ']';
a_list: ( initialId ( '=' initialId )? ','? )+;
edgeStmt: (node_id) edgeRHS label ',' a_list? ']';
label: ('[' LABEL '=' '"' (id)+ '"' );
edgeRHS: ( edgeop ( node_id ) )+;
edgeop: '->';
nodeStmt: node_id label? ',' a_list? ']';
node_id: initialId ;
id: ID | SPACE | DIGIT | LETTER | SYMBOL | STRING ;
initialId : STRING | LETTER | DIGIT;
And here the lexar rules:
GRAPH: [Gg] [Rr] [Aa] [Pp] [Hh];
DIGRAPH: [Dd] [Ii] [Gg] [Rr] [Aa] [Pp] [Hh];
NODE: [Nn] [Oo] [Dd] [Ee];
EDGE: [Ee] [Dd] [Gg] [Ee];
LABEL: [Ll] [Aa] [Bb] [Ee] [Ll];
/** "a numeral [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? )" */
NUMBER: '-'? ( '.' DIGIT+ | DIGIT+ ( '.' DIGIT* )? );
DIGIT: [0-9];
/** "any double-quoted string ("...") possibly containing escaped quotes" */
STRING: '"' ( '\\"' | . )*? '"';
/** "Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores
* ('_') or digits ([0-9]), not beginning with a digit"
*/
ID: LETTER ( LETTER | DIGIT )*;
SPACE: '" "';
LETTER: [a-zA-Z\u0080-\u00FF_];
SYMBOL: '<'| '>'| '&'| 'U'| '!';
COMMENT: '/*' .*? '*/' -> skip;
LINE_COMMENT: '//' .*? '\r'? '\n' -> skip;
/** "a '#' character is considered a line output from a C preprocessor */
PREPROC: '#' ~[\r\n]* -> skip;
/*whitespace are ignored from the constructor*/
WS: [ \t\n\r]+ -> skip;
I clicked on the ANTLR Recognizer section that create itself the files in java and the tokens to interpreter the grammars. Now I have to construct a parser in which I overrride some methods to match my code in Java with the java files created by ANTLR4. But first I want to understand if my grammar for that kind of DOT is correct. How can I verify that?
Upvotes: 0
Views: 770
Reputation: 6785
Re: "I clicked on the ANTLR Recognizer"... sounds like you're using some sort of IDE with a plugin or another ANTLR tool. Use use VS Code and IntelliJ with plugins, but neither has an "ANTLR Recognizer" section (that I can see). So the following assumes using the command line. It's simple command line stuff and definitely worth learning early on when using ANTLR. (Both of the plugins I use also give the ability to view the token stream and parse tree from within the plugin though)
I you follow the "QuickStart" at www.antlr.org, you'll have created the grun
alias that's useful for just this purpose.
(Assuming your grammar name is DOT
)
To dump out your token stream (the result of all you lexer rules)
grun DOT tokens -tokens
To verify that you're parsing input correctly:
grun DOT parse -gui
or
grun DOT parse -tree
BTW, it's rather unlikely that you'll need to override the parser class. First take a look into Visitor and Listeners.
Upvotes: 0