Leon Starr
Leon Starr

Reputation: 522

How best to parse a comma separate list in PEG grammar

I'm trying to parse a comma separated list. To simplify, I'm just using digits. These expressions would be valid:

(1, 4, 3)

()

(4)

I can think of two ways to do this and I'm wondering why exactly the failed example does not work. I believe it is a correct BNF, but I can't get it to work as PEG. Can anyone explain why exactly? I'm trying to get a better understanding of the PEG parsing logic.

I'm testing using the online browser parser generator here: https://pegjs.org/online

This does not work:

list = '(' some_digits? ')'
some_digits = digit / ', ' some_digits
digit = [0-9]

(actually, it parses okay, and likes () or (1) but doesn't recognize (1, 2)

But this does work:

list = '(' some_digits? ')'
some_digits = digit another_digit*
another_digit = ', ' digit
digit = [0-9]

Why is that? (Grammar novice here)

Upvotes: 6

Views: 1216

Answers (2)

beemtee
beemtee

Reputation: 941

In one line:

some_digits = '(' digit (', ' digit)* ')'

It depends on what you want with the values and on the PEG implementation, but extracting them might be easier this way.

Upvotes: 1

James Wasson
James Wasson

Reputation: 514

Cool question and after digging around in their docs for a second I found that the / character means:

Try to match the first expression, if it does not succeed, try the second one, etc. Return the match result of the first successfully matched expression. If no expression matches, consider the match failed.

So this lead me to the solution:

list = '(' some_digits? ')'
some_digits = digit ', ' some_digits / digit
digit = [0-9]

The reason this works:

input: (1, 4)

  • eats '('
  • check are there some digits?
  • check some_digits - first condition:
    • eats '1'
    • eats ', '
    • check some_digits - first condition:
      • eats '4'
      • fails to eat ', '
    • check some_digits - second condition:
      • eats '4'
      • succeeds
    • succeeds
  • eats ')'
  • succeeds

if you reverse the order of the some_digits conditions the first number is comes across gets eaten by digit and no recursion occurs. Then it throws an error because ')' is not present.

Upvotes: 5

Related Questions