Reputation: 36745
I want to build a parser for analyzing a large input file, but I don't need the entire input file, only some parts of it.
For exmaple, the input file may look like this:
bla bla bla bla bla ...
EVENT: e1
type: t1
version: 1
additional-info: abc
EVENT: e2
type: t2
version: 1
uninteresting-info: def
blu blu blu blu blu ...
From this file, all I want is to have a map of event to type (e1=>t1, e2=>t2). All other information is of no interest for me.
How can I build a simple ANTLR grammar that does this?
Upvotes: 3
Views: 1160
Reputation: 170158
You can do that by introducing a boolean flag inside your lexer that keeps track whether an event
- or type
-keyword has been encountered. If it has been encountered, the lexer should not skip the word, all other words should be skipped.
A small demo:
grammar T;
@lexer::members {
private boolean ignoreWord = true;
}
parse
: event* EOF
;
event
: Event w1=Word Type w2=Word
{System.out.println("event=" + $w1.text + ", type=" + $w2.text);}
;
Event
: 'EVENT:' {ignoreWord=false;}
;
Type
: 'type:' {ignoreWord=false;}
;
Word
: ('a'..'z' | 'A'..'Z' | '0'..'9')+ {if(ignoreWord) skip();}
;
NewLine
: ('\r'? '\n' | '\r') {ignoreWord=true; skip();}
;
Other
: . {skip();}
;
You can test the parser with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"bla bla bla bla bla ... \n" +
" \n" +
"prEVENT: ... \n" +
"EVENTs: ... \n" +
" \n" +
"EVENT: e1 \n" +
"type: t1 \n" +
"version: 1 \n" +
"additional-info: abc \n" +
" \n" +
"EVENT: e2 \n" +
"type: t2 \n" +
"version: 1 \n" +
"uninteresting-info: def \n" +
" \n" +
"blu blu blu blu blu ... \n";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
which will produce the following output:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
event=e1, type=t1
event=e2, type=t2
Upvotes: 3