Caterina
Caterina

Reputation: 13

Grammar in ANTLR4

So I have take inspiration from the DOT.g4 grammar in this github repository grammars-v4/dot/DOT.g4. Tht's why I have as well a DOT file to parse.

This is a possible structure of my DOT file:

digraph G {
 rankdir=LR
  label="\n[Büchi]"
  labelloc="t"
  node [shape="circle"]
  I [label="", style=invis, width=0]
  I -> 34
  0 [label="0", peripheries=2]
  0 -> 0 [label="!v_0"]
  1 [label="1", peripheries=2]
  1 -> 1 [label="!v_2 & !v_5"]
  2 [label="2"]
  2 -> 1 [label="v_0 & v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
  3 [label="3"]
  3 -> 1 [label="v_0 & v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
  4 [label="4"]
  4 -> 1 [label="v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
  5 [label="5"]
  5 -> 1 [label="v_0 & v_1 > 5 & !v_2 & v_3 < 8 & !v_5"]
}

And Here my grammar.g4 file that I have modified from the link above:

parse: nba| EOF;
nba: STRICT? ( GRAPH | DIGRAPH ) ( initialId? ) '{' stmtList '}';
stmtList : ( stmt ';'? )* ;
stmt: nodeStmt| edgeStmt| attrStmt | initialId '=' initialId;
attrStmt: ( GRAPH | NODE | EDGE )  '[' a_list? ']';
a_list: ( initialId ( '=' initialId  )? ','? )+;
edgeStmt: (node_id) edgeRHS label ',' a_list? ']';
label: ('[' LABEL '=' '"' (id)+ '"' );
edgeRHS: ( edgeop ( node_id ) )+;
edgeop: '->';
nodeStmt: node_id label? ',' a_list? ']';
node_id: initialId ;
id: ID | SPACE | DIGIT | LETTER | SYMBOL | STRING ;
initialId : STRING | LETTER | DIGIT;

And here the lexar rules:

GRAPH: [Gg] [Rr] [Aa] [Pp] [Hh];
DIGRAPH: [Dd] [Ii] [Gg] [Rr] [Aa] [Pp] [Hh];
NODE: [Nn] [Oo] [Dd] [Ee];
EDGE: [Ee] [Dd] [Gg] [Ee];
LABEL: [Ll] [Aa] [Bb] [Ee] [Ll];
/** "a numeral [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? )" */
NUMBER: '-'? ( '.' DIGIT+ | DIGIT+ ( '.' DIGIT* )? );
DIGIT: [0-9];
/** "any double-quoted string ("...") possibly containing escaped quotes" */
STRING: '"' ( '\\"' | . )*? '"';
/** "Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores
 *  ('_') or digits ([0-9]), not beginning with a digit"
*/
ID: LETTER ( LETTER | DIGIT )*;
SPACE: '" "';
LETTER: [a-zA-Z\u0080-\u00FF_];
SYMBOL: '<'| '>'| '&'| 'U'| '!';
COMMENT: '/*' .*? '*/' -> skip;
LINE_COMMENT: '//' .*? '\r'? '\n' -> skip;
/** "a '#' character is considered a line output from a C preprocessor */
PREPROC: '#' ~[\r\n]* -> skip;
/*whitespace are ignored from the constructor*/
WS: [ \t\n\r]+ -> skip;

I clicked on the ANTLR Recognizer section that create itself the files in java and the tokens to interpreter the grammars. Now I have to construct a parser in which I overrride some methods to match my code in Java with the java files created by ANTLR4. But first I want to understand if my grammar for that kind of DOT is correct. How can I verify that?

Upvotes: 0

Views: 770

Answers (1)

Mike Cargal
Mike Cargal

Reputation: 6785

Re: "I clicked on the ANTLR Recognizer"... sounds like you're using some sort of IDE with a plugin or another ANTLR tool. Use use VS Code and IntelliJ with plugins, but neither has an "ANTLR Recognizer" section (that I can see). So the following assumes using the command line. It's simple command line stuff and definitely worth learning early on when using ANTLR. (Both of the plugins I use also give the ability to view the token stream and parse tree from within the plugin though)

I you follow the "QuickStart" at www.antlr.org, you'll have created the grun alias that's useful for just this purpose.

(Assuming your grammar name is DOT)

To dump out your token stream (the result of all you lexer rules)

grun DOT tokens -tokens

To verify that you're parsing input correctly:

grun DOT parse -gui

or

grun DOT parse -tree

BTW, it's rather unlikely that you'll need to override the parser class. First take a look into Visitor and Listeners.

Upvotes: 0

Related Questions