sol
sol

Reputation: 323

Antlr3 report java.lang.OutOfMemoryError when parse expression

I try to match the string "match 'match content'", meanwhile extract match content that within single quotes. But throws the following exception:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.antlr.runtime.Lexer.emit(Lexer.java:160)
    at org.antlr.runtime.Lexer.nextToken(Lexer.java:91)
    at org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133)
    at org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127)
    at org.antlr.runtime.CommonTokenStream.consume(CommonTokenStream.java:70)
    at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:106)

I don't known why throws OOM exception and i can not find error define in dot g file.

My dot g file:

grammar Contains;

options {
    language=Java;
    output=AST;
    ASTLabelType=CommonTree;
    backtrack=false;
    k=3;
}

match
    :
    KW_MATCH SINGLE_QUOTE ( ~(SINGLE_QUOTE|'\\') | ('\\' .) )+ SINGLE_QUOTE
    ;

regexp 
    :
    KW_REGEXP SINGLE_QUOTE RegexComponent+ SINGLE_QUOTE
    ;

range 
    :
    KW_RANGE  SINGLE_QUOTE left=(LPAREN | LSQUARE) start=Number COMMA end = Number right=(RPAREN | RSQUARE) SINGLE_QUOTE
    ;


DOT : '.'; // generated as a part of Number rule
COLON : ':' ;
COMMA : ',' ;

LPAREN : '(' ;
RPAREN : ')' ;
LSQUARE : '[' ;
RSQUARE : ']' ;
LCURLY : '{';
RCURLY : '}';

PLUS : '+';
MINUS : '-';
STAR : '*';

BITWISEOR : '|';
BITWISEXOR : '^';
QUESTION : '?';
DOLLAR : '$';

KW_RANGE : 'RANGE';
KW_REGEXP : 'REGEXP';
KW_MATCH : 'MATCH';

DOUBLE_QUOTE : '\"';
SINGLE_QUOTE : '\'';

fragment
Digit
    :
    '0'..'9'
    ;

fragment
Exponent
    :
    ('e' | 'E') ( PLUS|MINUS )? (Digit)+
    ;

fragment
RegexComponent
    : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_'
    | PLUS | STAR | QUESTION | MINUS | DOT
    | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY
    | BITWISEXOR | BITWISEOR | DOLLAR | '\u0080'..'\u00FF' | '\u0400'..'\u04FF'
    | '\u0600'..'\u06FF' | '\u0900'..'\u09FF' | '\u4E00'..'\u9FFF' | '\u0A00'..'\u0A7F'
    ;

Number
    :
    (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)?
    ;

WS  :  (' '|'\r'|'\t'|'\n'|'\u000C')* {$channel=HIDDEN;}
    ;

Upvotes: 2

Views: 81

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

You could start by changing:

WS  :  (' '|'\r'|'\t'|'\n'|'\u000C')* {$channel=HIDDEN;}
    ;

to:

WS  :  (' '|'\r'|'\t'|'\n'|'\u000C')+ {$channel=HIDDEN;}
    ;

Your version matches an empty string, which might produce an infinite amount of tokens (which might throw an OOME).

Upvotes: 1

Related Questions