user1019710
user1019710

Reputation: 321

ANTLR match only a specific string within a stream and ignore the rest

First of all, i am new to ANTLR. What i am asking may be trivial for the rest of you guys, but i need your help.

I want to match all the qualified names within a stream, and ignore the rest of the characters from the stream.

I tried the following:

findAllQualifiedNames
    :   qualifiedName+
    ;

qualifiedName 
    :   IDENTIFIER
        ('.' IDENTIFIER)*
    ;

IDENTIFIER
    :   ('_'
    |   '$'
    |   ('a'..'z' | 'A'..'Z'))
        ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*
    ;
AnyOtherChar
    :   . 
    {$channel=HIDDEN;}
    ;

But it doesn't work the way i expected: for the input a.b.c;d.e.f;, it matches only a.b.c as a qualified name. And i get the error:

No viable alternative at ;

EDIT:

For the grammar above, i tried the following input: a.b.c; d.e.f; .. {x.y;}

I expected to match a.b.c, d.e.f and x.y, but i get the following:

Eclipse plugin interpreter

Upvotes: 1

Views: 485

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170227

But it doesn't work the way i expected: for the input a.b.c;d.e.f;, it matches only a.b.c as a qualified name. And i get the error: ...

I cannot reproduce that.

Using the debugger from ANTLRWorks 1.4.3, I get the following parse tree:

enter image description here

(as you can see, no error/warning is printed on the output-screen (lower-left corner))

Of course, you'll need to account for text inside string literals and comments that "look" like qualified names, but I showed that in a previous Q&A of yours (I'm posting this last remark more for future readers that might think it's that easy to fetch all qualified names from a Java source file).

EDIT

The fact that a.b.c; d.e.f; .. {x.y;} produces error(s) is because the two .'s (dots) in there. The dot is being tokenized separately, not as a AnyOtherChar token.

Defining literal tokens inside parser rules (like you did with '.' in qualifiedName) does not cause these tokens to be created only in those parser rules. The following two grammars are identical:

1

qualifiedName : IDENTIFIER ('.' IDENTIFIER)*;
IDENTIFIER    : ('_' | '$' | 'a'..'z' | 'A'..'Z')+;

2

qualifiedName : IDENTIFIER (DOT IDENTIFIER)*;
IDENTIFIER    : ('_' | '$' | 'a'..'z' | 'A'..'Z')+;
DOT           : '.';

Upvotes: 3

Related Questions