Reputation: 47
I have the follow grammar
GUID : GUIDBLOCK GUIDBLOCK '-' GUIDBLOCK '-' GUIDBLOCK '-' GUIDBLOCK
'-'
GUIDBLOCK GUIDBLOCK GUIDBLOCK;
SELF : 'self(' GUID ')';
fragment
GUIDBLOCK: [A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9];
atom : SELF # CurrentGuid
This is my visitor
@Override
public String visitCurrentGuid(CalcParser.CurrentRecordContext ctx) {
System.out.println("Guid is : " + ctx.getText());
System.out.println("Guid is : " + ctx.getChild(0));
return ctx.getText();
}
With input "self(5827389b-c8ab-4804-8194-e23fbdd1e370)"
There's only one child which is the whole input itself "self(5827389b-c8ab-4804-8194-e23fbdd1e370)"
How should I go about to get the guid part?
From my understanding if my grammar structure is construct as AST, I should be able to print out the tree.
How should I update my grammar?
Thanks
Upvotes: 1
Views: 70
Reputation: 370112
Fragments don't appear at all in the AST - they're basically treated as if you'd written their contents directly in the lexer rule that uses them. So moving code into fragments makes your code easier to read, but does not affect the generated AST at all.
Lexer rules that are used by other lexer rules are also treated as fragments in that context. That is, if a lexer rule uses another lexer rule, it will still produce a single token with no nested structure - just as if you had used a fragment. The fact that it's a lexer rule and not a fragment only makes a difference when the pattern occurs on its own without being part of the larger pattern.
The key is that a lexer rule always produces a single token and tokens have no subtokens. They're the leaves of the AST. Nodes with children are generated from parser rules.
The only parser rule that you have is atom
. atom
only invokes one other rule SELF
. So the generated tree will consist of an atom
that contains as its only child a SELF
token and, as previously stated, tokens are leafs, so that's the end of the tree.
What you probably want to do to get a useful tree is to make GUIDBLOCK
a lexer rule (your only lexer rule, in fact) and turn everything else into parser rules. That'd also mean that you can get rid of atom
(possibly renaming SELF
to atom
if you want).
Then you'll end up with a tree consisting of a self
(or atom
if you renamed it) node that contains as its children a 'self('
token, a guid
node (which you might want to assign a name for easy access) and a )
token. The guid
node in turn would contain a sequence of GUIDBLOCK
and '-'
tokens. You can also add blocks+=
before every use of GUIDBLOCK
to get a list that only contains the GUIDBLOCK
tokens without the dashes.
It might also make sense to turn 'self('
into two tokens (i.e. 'self' '('
) - especially if you ever want to add a rule to ignore whitespace.
Upvotes: 1