Pisut Hutton
Pisut Hutton

Reputation: 47

Antlr4 - Get function's param(s) Java

I have the follow grammar

GUID  : GUIDBLOCK GUIDBLOCK '-' GUIDBLOCK '-' GUIDBLOCK '-' GUIDBLOCK 
'-' 
GUIDBLOCK GUIDBLOCK GUIDBLOCK;

SELF  : 'self(' GUID ')';

fragment 
GUIDBLOCK: [A-Za-z0-9][A-Za-z0-9][A-Za-z0-9][A-Za-z0-9];

atom : SELF # CurrentGuid

This is my visitor

@Override
public String visitCurrentGuid(CalcParser.CurrentRecordContext ctx) {
    System.out.println("Guid is : " + ctx.getText());
    System.out.println("Guid is : " + ctx.getChild(0));
    return ctx.getText();
}

With input "self(5827389b-c8ab-4804-8194-e23fbdd1e370)"

There's only one child which is the whole input itself "self(5827389b-c8ab-4804-8194-e23fbdd1e370)"

How should I go about to get the guid part?

From my understanding if my grammar structure is construct as AST, I should be able to print out the tree.

How should I update my grammar?

Thanks

Upvotes: 1

Views: 70

Answers (1)

sepp2k
sepp2k

Reputation: 370112

Fragments don't appear at all in the AST - they're basically treated as if you'd written their contents directly in the lexer rule that uses them. So moving code into fragments makes your code easier to read, but does not affect the generated AST at all.

Lexer rules that are used by other lexer rules are also treated as fragments in that context. That is, if a lexer rule uses another lexer rule, it will still produce a single token with no nested structure - just as if you had used a fragment. The fact that it's a lexer rule and not a fragment only makes a difference when the pattern occurs on its own without being part of the larger pattern.

The key is that a lexer rule always produces a single token and tokens have no subtokens. They're the leaves of the AST. Nodes with children are generated from parser rules.

The only parser rule that you have is atom. atom only invokes one other rule SELF. So the generated tree will consist of an atom that contains as its only child a SELF token and, as previously stated, tokens are leafs, so that's the end of the tree.

What you probably want to do to get a useful tree is to make GUIDBLOCK a lexer rule (your only lexer rule, in fact) and turn everything else into parser rules. That'd also mean that you can get rid of atom (possibly renaming SELF to atom if you want).

Then you'll end up with a tree consisting of a self (or atom if you renamed it) node that contains as its children a 'self(' token, a guid node (which you might want to assign a name for easy access) and a ) token. The guid node in turn would contain a sequence of GUIDBLOCK and '-' tokens. You can also add blocks+= before every use of GUIDBLOCK to get a list that only contains the GUIDBLOCK tokens without the dashes.

It might also make sense to turn 'self(' into two tokens (i.e. 'self' '(') - especially if you ever want to add a rule to ignore whitespace.

Upvotes: 1

Related Questions