jabozzo
jabozzo

Reputation: 611

parsec.py recursive definitions

I'm loving the simplicity of this library. Sadly I haven't figure out how to make recursive definitions:

Consider this minimal contrived example:

import parsec as psc
digit = psc.regex("[0-9]")
number = lambda: digit ^ (digit + number())
_ = number().parse("42")

As expected, the evaluation of number() is infinite recursive:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
  File "<stdin>", line 1, in <lambda>
  File "<stdin>", line 1, in <lambda>
  [Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded

I know in this specific example it can be solved by changing line 3 to

number = lambda: psc.many1(digit)

but I'm not sure it's always possible for a general case.

Is it possible to make the parsing of number recursive as in

<number> ::= <digit><number> | <digit>

Upvotes: 4

Views: 341

Answers (2)

smheidrich
smheidrich

Reputation: 4519

(UPDATE) Disclaimer: As we found out in the comments, this only works with versions of parsec.py above 3.3, which (as of August 2018) is the latest release available on PyPI, so for now you have to manually install the development version from GitHub in order for this solution work.

UPDATE 2: parsec.py v3.4 has finally been released and fixes the issue.


In cases like these, I find that it is often helpful to "manually expand" the primitives provided by parsec and write a "low-level" parser of our own (i.e. one that takes text and index arguments, not one built up from parsec primitives), just to see what is going on. By taking the definition of try_choice (^) from the parsec source code and manually plugging in the constituents of your expression1, we can write a parser which I think does what you want:

import parsec as psc
digit = psc.regex("[0-9]")

def number():
  @psc.Parser
  def number_parser(text, index):
    res = (digit + number())(text, index)
    return res if res.status else digit(text, index)
  return number_parser

_ = number().parse("42423")
print(_)

Output:

('4', ('2', ('4', ('2', '3'))))

The reason this works is of course because the "parser-returning function" number() gets called only conditionally within the actual parser, not (unconditionally) within itself.

An easier way to write this is to not use a "parser-returning function" at all but just write the parser itself directly:

@psc.Parser
def number(text, index):
  return ((digit + number) ^ digit)(text, index)

The usage changes from number().parse("42423") to number.parse("42423") but it does the same thing.

Lastly, is there some functionality in parsec.py that lets you do this without the explicit text and index arguments, such that the parser is built up entirely from other parsec.py primitives? It turns out that parsec.generate fits the bill rather nicely! From its in-source documentation:

The most powerful way to construct a parser is to use the generate decorator. The generate creates a parser from a generator that should yield parsers. These parsers are applied successively and their results are sent back to the generator using .send() protocol. The generator should return the result or another parser, which is equivalent to applying it and returning its result.

You can find example usages here, from which it becomes clear that we can write:

@psc.generate
def number():
  r = yield (digit + number) ^ digit
  return r

This works because besides all the fancy generator magic it does, the generate decorator returns a function which is itself decorated with @parsec.Parser (cf. source linked above), so the resulting, fully decorated number function will be of the same kind as the one in the 2nd solution. Hence, we can use it in the same manner as digit etc. without having to call it or anything, which is what we do in the yield expression.

Usage is again just number.parse("42423") and it returns the same thing as the other two solutions.

Maybe there is a better solution out there but this is all I could come up with.


1 I had to reverse the order from digit ^ (digit + number()) to (digit + number()) ^ digit, because the first one returns just the first digit and then considers itself done. I hope this is ok?

Upvotes: 2

jabozzo
jabozzo

Reputation: 611

Another quick way to solve this problem if there are already a lot of parser-returning functions or just want to be consistent in style is to make the function lazy evaluate:

import parsec as psc

def lazy(fn):
  @psc.Parser
  def _lazy(text, index):
    return fn()(text, index)
  return _lazy

digit = lambda: psc.regex("[0-9]")
number = lambda: (digit() + lazy(number)) ^ digit()
print(number().parse("423242"))

Prints:

('4', ('2', ('3', ('2', ('4', '2')))))

Upvotes: 0

Related Questions