mliebelt
mliebelt

Reputation: 15525

Parslet grammar for rules starting identical

I want to provide a parser for parsing so called Subversion config auth files (see patch based authorization in the Subversion red book). Here I want to define rules for directories like

[/]
* = r
[/trunk]
@PROJECT = rw

So the part of the grammar I have problems is the path definition. I currently have the following rules in Parslet:

rule(:auth_rule_head) { (str('[') >> path >> str(']') >> newline).as(:arh) }
rule(:top)          { (str('/')).as(:top) }
rule(:path)         { (top | ((str('/') >> path_ele).repeat)).as(:path) }
rule(:path_ele)     { ((str('/').absent? >> any).repeat).as(:path_ele) }

So I want to divide in two cases:

The problematic rule seems to be the path that defines an alternative, here / XOR something like /trunk

I have defined test cases for those, and get the following error when running the test case:

Failed to match sequence (SPACES '[' PATH ']' NEWLINE) at line 1 char 3.
`- Expected "]", but got "t" at line 1 char 3.

So the problem seems to be, that the alternative (rule :path) is chosen all the time top.

What is a solution (as a grammar) for this problem? I think there should be a solution, and this looks like something idiomatic that should happen from here to there. I am not an expert at all with PEG parsers or parser / compiler generation, so if that is a base problem not solvable, I would like to know that as well.

Upvotes: 1

Views: 77

Answers (2)

mliebelt
mliebelt

Reputation: 15525

Seems to be I have not got the problem right. I have tried to reproduce the problem in creating a small example grammar including some unit tests, but now, the thing is working.

If you are interested in it, have a look at the gist https://gist.github.com/mliebelt/a36ace0641e61f49d78f. You should be able to download the file, and run it directly from the command line. You have to have installed first parslet, minitest should be already included in a current Ruby version.

I have added there only the (missing) rule for newline, and added 3 unit tests to test all cases:

  • The root: /
  • A path with only one element: /my
  • A path with more than one element: /my/path

Works like expected, so I get two cases here:

  • Case with the top elemment only Top element only
  • Case with one or more path elements One or more path elements

Perhaps this may help others how to debug a situation like that.

Upvotes: 0

Nigel Thorne
Nigel Thorne

Reputation: 21548

In short: Swap the OR conditions around.

Parlset rules consume the input stream until they get a match, then they stop. If you have two possible options (an OR), the first is tried, and only if it doesn't match is the second tried.

In your case, as all your paths start with '/' they all match the first part of the path rule, so the second half is never explored.

You need to try to match the full path first, and only match the 'top' if it fails.

# changing this
rule(:path)         { (top | ((str('/') >> path_ele).repeat)).as(:path) }

# to this
rule(:path)         { ((str('/') >> path_ele).repeat) | top).as(:path) }

# fixes your first problem :)

Also... Be careful of rules that can consume nothing being in a loop. Repeat by default is repeat(0). Usually it needs to be repeat (1).

rule(:path)         { ((str('/') >> path_ele).repeat(1)) | top).as(:path) }

also...

Is "top" really a special case? All paths end in a "/", so top is just the zero length path.

rule(:path)         { (path_ele.repeat(0)  >> str('/')).as(:path) }

Or

rule(:path)         { (str('/') >> path_ele.repeat(0)).as(:path) }
rule(:path_ele)     { ((str('/').absent? >> any).repeat(0)).as(:path_ele) >> str('/') } 
# assuming "//" is valid otherwise repeat(1)

Upvotes: 1

Related Questions