MariaH
MariaH

Reputation: 341

Using Antlr to get identifiers and function names

I'm trying to use and understand AntLR, this is new to me. My purpose is to read a source code file written in C and extract from it the identifiers (variables and function names).

In my C grammar (file C.g4) consider:

identifierList
    :   Identifier
    |   identifierList Comma Identifier
    ;
Identifier
    :   IdentifierNondigit
        (   IdentifierNondigit
        |   Digit
        )*
    ;

After generation of parser and listener I create my own listener to the identifierList.

Note that MyCListener class extends CBaseListener:

public class MyCListener extends CBaseListener {


@Override
public void enterIdentifierList(CParser.IdentifierListContext ctx) {
    List<ParseTree> children = ctx.children;
    for (ParseTree parseTree : children) {
        System.out.println(parseTree.getText());
    }

}

Then I have this in main class:

 String fileurl = "C:/example.c";

 CLexer lexer;
 try {
       lexer = new CLexer(new ANTLRFileStream(fileurl));
       CommonTokenStream tokens = new CommonTokenStream(lexer);
       CParser parser = new CParser(tokens);

       CParser.IdentifierListContext identifierContext = parser.identifierList();
       ParseTreeWalker walker = new ParseTreeWalker();
       MyCListener listener = new MyCListener();
       walker.walk(listener, identifierContext);

 } catch (IOException ex) {
       Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
 }

Where example.c is:

int main() {

// this is C

 int i=0; // i is int
 /* double j=0.0;
    C
 */
}

What am I doing wrong? Maybe I didn't write MyCListener properly, or identifierList is not what I need to listen... Really don't know. I'm sorry, but I didn't even understand my output, why is there a lexical error?:

line 3:4 mismatched input '(' expecting {<EOF>, ','}
main
(
)
{
int
i
=
0
;
}

As you see, I'm very confused about this. Can somebody help me ? Please...

Upvotes: 3

Views: 3462

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170148

With this line:

CParser.IdentifierListContext identifierContext = parser.identifierList();

you're trying to parse your entire input as an identifierList. But your input isn't just that.

Assuming you're using the C.g4 from the ANTLR4 Github repository, try to let the parser start at the entry point of the grammar (which is the rule compilationUnit):

MyCListener listener = new MyCListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.compilationUnit());

EDIT

Here's a quick demo:

public class Main {

    public static void main(String[] args) throws Exception {

        final List<String> identifiers = new ArrayList<String>();

        String source = "int main() {\n" +
                "\n" +
                "// this is C\n" +
                "\n" +
                " int i=0; // i is int\n" +
                " /* double j=0.0;\n" +
                "    C\n" +
                " */\n" +
                "}";

        CLexer lexer = new CLexer(new ANTLRInputStream(source));
        CParser parser = new CParser(new CommonTokenStream(lexer));

        ParseTreeWalker.DEFAULT.walk(new CBaseListener(){

            @Override
            public void enterDirectDeclarator(@NotNull CParser.DirectDeclaratorContext ctx) {
                if (ctx.Identifier() != null) {
                    identifiers.add(ctx.Identifier().getText());
                }
            }

            // Perhaps override other rules that use `Identifier`

        }, parser.compilationUnit());

        System.out.println("identifiers -> " + identifiers);
    }
}

which would print:

identifiers -> [main, i]

Upvotes: 5

Related Questions