gbajson
gbajson

Reputation: 1730

Pyparsing - grammar with recursion

I am trying to create grammar which will parse the following expressions:

  1. func()

  2. func(a)

  3. func(a) + func(b)

  4. func(func(a) + func()) + func(b)

I implemented it for (1) and (2), but once I extended rvalue << (identifier | function_call) by operation, it stopped working due to:

Exception raised:Expected W:(ABCD...), found ')'  (at char 5), (line:1, col:6)
Exception raised:maximum recursion depth exceeded

Can anyone of you explain why? As far as I understood in expression rvalue << (identifier | function_call | operation) function_call should be matched before operation and the recursion shouldn't take place.

Code:

from pyparsing import Forward, Optional, Word, Literal, alphanums, delimitedList

rvalue = Forward()

operation = rvalue + Literal('+') + rvalue
identifier = Word(alphanums + '_')('identifier')
function_args = delimitedList(rvalue)('function_args')

function_name = identifier('function_name')
function_call = (
    (function_name + Literal("(") + Optional(function_args) + Literal(")"))
)('function_call')

rvalue << (identifier | function_call | operation)
function_call.setDebug()


def test_function_call_no_args():
    bdict = function_call.parseString("func()", parseAll=True).asDict()
    assert bdict['function_name'] == 'func'
    assert 'function_args' not in bdict


def test_function_call_one_arg():
    bdict = function_call.parseString("func(arg)", parseAll=True).asDict()
    assert bdict['function_name'] == 'func'
    assert 'function_args' in bdict


def test_function_call_many_args():
    bdict = function_call.parseString("func(arg1, arg2)", parseAll=True).asDict()
    assert bdict['function_name'] == 'func'
    assert 'function_args' in bdict

Upvotes: 2

Views: 221

Answers (1)

sepp2k
sepp2k

Reputation: 370102

As far as I understood in expression rvalue << (identifier | function_call | operation) function_call should be matched before operation and the recursion shouldn't take place.

The recursion doesn't take place if one of the previous alternatives succeeds. But if both fail, operation is tried and you get infinite recursion.

For example, in test_function_call_no_args you try to parse func() using the function_call rule. This will parse func as the name of the function and ( as the beginning of the argument list. Then it will try to parse Optional(function_args), which will in turn try to parse delimitedList(rvalue). Now this will try to parse rvalue and since ) doesn't match the first two alternatives, it will try the last one, which will cause the infinite recursion.

When a rule is recursive, you must always consume input before the recursion is reached - it must not be possible to reach the recursion without consuming input. So having the recursion come last in an alternative isn't enough - there must actually be another non-optional rule (that also doesn't match the empty string) be successfully invoked before it.

PS: rvalue as it is can actually never match a function call because function calls start with an identifier and you match identifier first.

Upvotes: 2

Related Questions