user3642381
user3642381

Reputation: 23

ANTLR4 How would I extract python expression variables

Using the following ANTLR grammar: https://github.com/bkiers/python3-parser/blob/master/src/main/antlr4/nl/bigo/pythonparser/Python3.g4 I want to parse from a given expression, lets say:

x.split(y, 3)

or

x + y

The variables x and y. How would I achieve this?

I tried the following approach but it seems cumbersome since I must add all build-in python functions:

Define a Listener interface

const listener = new MyPythonListener()
antlr.tree.ParseTreeWalker.DEFAULT.walk(listener, abstractTree)

Use regex + pattern matching:

const symbolicNames = ['TRUE', 'FALSE', 'NUMEBRS', 'STRING', 'LIST', 'TUPLE', 'DICTIONARY', 'INT', 'LONG', 'FLOAT', 'COMPLEX',
'BOOL', 'STR', 'INT', 'RANGE', 'NONE', 'LEN']

class MyPythonListener extends Python3Listener {
    variables = []

    enterExpr(ctx) {
        const text = this.getElementText(ctx)
        if (text && this.verifyIsVariable(text)) {
            this.variables.push(text)
        }
    }

    verifyIsVariable(leafText) {
        return !leafText.includes('"') && !leafText.includes('\'') && isNaN(leafText) &&
            !symbolicNames.includes(leafText.toUpperCase()) && leafText.match(/^[0-9a-zA-Z_]+$/)
    }
}

Upvotes: 0

Views: 1024

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170207

I didn't look too closely at it, but after inspecting the parse tree for the Python code:

def some_method_name(some_param_name):
    x.split(y, 3)

it appears that the variable names are children of the atom rule:

atom
 : '(' ( yield_expr | testlist_comp )? ')' 
 | '[' testlist_comp? ']'  
 | '{' dictorsetmaker? '}' 
 | NAME 
 | number 
 | str+ 
 | '...' 
 | NONE
 | TRUE
 | FALSE
 ;

where NAME is a variable name.

So you could do something like this:

String source = "def some_method_name(some_param_name):\n    x.split(y, 3)\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

ParseTreeWalker.DEFAULT.walk(new Python3BaseListener() {
    @Override
    public void enterAtom(Python3Parser.AtomContext ctx) {
        if (ctx.NAME() != null) {
            System.out.println(ctx.NAME().getText());
        }
    }
}, parser.file_input());

which will print:

x
y

and not the method and parameter names.

Again: not thoroughly tested, I leave that for you. You can pretty print the parse tree like this:

String source = "def some_method_name(some_param_name):\n    x.split(y, 3)\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

System.out.println(new Builder.Tree(source).toStringASCII());

to inspect for yourself where the nodes you're intereseted in occur in the parse tree. The code above will print:

'- file_input
   |- stmt
   |  '- compound_stmt
   |     '- funcdef
   |        |- def
   |        |- some_method_name
   |        |- parameters
   |        |  |- (
   |        |  |- typedargslist
   |        |  |  '- tfpdef
   |        |  |     '- some_param
   |        |  '- )
   |        |- :
   |        '- suite
   |           |- <NEWLINE>
   |           |- <INDENT>
   |           |- stmt
   |           |  '- simple_stmt
   |           |     |- small_stmt
   |           |     |  '- expr_stmt
   |           |     |     '- testlist_star_expr
   |           |     |        '- test
   |           |     |           '- or_test
   |           |     |              '- and_test
   |           |     |                 '- not_test
   |           |     |                    '- comparison
   |           |     |                       '- star_expr
   |           |     |                          '- expr
   |           |     |                             '- xor_expr
   |           |     |                                '- and_expr
   |           |     |                                   '- shift_expr
   |           |     |                                      '- arith_expr
   |           |     |                                         '- term
   |           |     |                                            '- factor
   |           |     |                                               '- power
   |           |     |                                                  |- atom
   |           |     |                                                  |  '- x
   |           |     |                                                  |- trailer
   |           |     |                                                  |  |- .
   |           |     |                                                  |  '- split
   |           |     |                                                  '- trailer
   |           |     |                                                     |- (
   |           |     |                                                     |- arglist
   |           |     |                                                     |  |- argument
   |           |     |                                                     |  |  '- test
   |           |     |                                                     |  |     '- or_test
   |           |     |                                                     |  |        '- and_test
   |           |     |                                                     |  |           '- not_test
   |           |     |                                                     |  |              '- comparison
   |           |     |                                                     |  |                 '- star_expr
   |           |     |                                                     |  |                    '- expr
   |           |     |                                                     |  |                       '- xor_expr
   |           |     |                                                     |  |                          '- and_expr
   |           |     |                                                     |  |                             '- shift_expr
   |           |     |                                                     |  |                                '- arith_expr
   |           |     |                                                     |  |                                   '- term
   |           |     |                                                     |  |                                      '- factor
   |           |     |                                                     |  |                                         '- power
   |           |     |                                                     |  |                                            '- atom
   |           |     |                                                     |  |                                               '- y
   |           |     |                                                     |  |- ,
   |           |     |                                                     |  '- argument
   |           |     |                                                     |     '- test
   |           |     |                                                     |        '- or_test
   |           |     |                                                     |           '- and_test
   |           |     |                                                     |              '- not_test
   |           |     |                                                     |                 '- comparison
   |           |     |                                                     |                    '- star_expr
   |           |     |                                                     |                       '- expr
   |           |     |                                                     |                          '- xor_expr
   |           |     |                                                     |                             '- and_expr
   |           |     |                                                     |                                '- shift_expr
   |           |     |                                                     |                                   '- arith_expr
   |           |     |                                                     |                                      '- term
   |           |     |                                                     |                                         '- factor
   |           |     |                                                     |                                            '- power
   |           |     |                                                     |                                               '- atom
   |           |     |                                                     |                                                  '- number
   |           |     |                                                     |                                                     '- integer
   |           |     |                                                     |                                                        '- 3
   |           |     |                                                     '- )
   |           |     '- <NEWLINE>
   |           '- <DEDENT>
   '- <EOF>

Note that the Builder.Tree class is not part of the ANTLR library, it resides in the/my repo you linked to in your question: https://github.com/bkiers/python3-parser/blob/master/src/main/java/nl/bigo/pythonparser/Builder.java

Upvotes: 1

Related Questions