Luis
Luis

Reputation: 1293

Matching a string with a specific end in LPeg

I'm trying to capture a string with a combination of a's and b's but always ending with b. In other words:

local patt = S'ab'^0 * P'b'

matching aaab and bbabb but not aaa or bba. The above however does not match anything. Is this because S'ab'^0 is greedy and matches the final b? I think so and can't think of any alternatives except perhaps resorting to lpeg.Cmt which seems like overkill. But maybe not, anyone know how to match such a pattern? I saw this question but the problem with the solution there is that it would stop at the first end marker (i.e. 'cat' there, 'b' here) and in my case I need to accept the middle 'b's.

P.S. What I'm actually trying to do is match an expression whose outermost rule is a function call. E.g.

func();
func(x)(y);
func_arr[z]();

all match but

exp;
func()[1];
4 + 5;

do not. The rest of my grammar works and I'm pretty sure this boils down to the same issue but for completeness, the grammar I'm working with looks something like:

top_expr = V'primary_expr' * V'postfix_op'^0 * V'func_call_op' * P';';
postfix_op = V'func_call_op' + V'index_op';

And similarly the V'postfix_op'^0 eats up the func_call_op I'm expecting at the end.

Upvotes: 1

Views: 529

Answers (3)

Brynne Taylor
Brynne Taylor

Reputation: 116

Sorry my answer comes too late but I think it's worth to give this question a more correct answer.

As I understand it, you just want a non-blind greedy match. But unfortunately the "official documentation" of LPeg only tells us how to use LPeg for blind greedy match (or repetition). But this pattern can be described by a parsing expression grammar. For rule S if you want to match as many E1 as you can followed by E2, you need to write

S <- E1 S / E2

The solution to a/b problem becomes

S <- [ab] S / 'b'

You might want to optimize the rule by inserting some a's in the first option

S <- [ab] 'a'* S / 'b'

which will reduce the recursions a lot. As for your real problem, here's my answser:

top_expr   <- primary_expr p_and_f ';'
p_and_f    <- postfix_op p_and_f / func_call_op
postfix_op <- func_call_op / index_op

Upvotes: 0

Nick Gammon
Nick Gammon

Reputation: 1171

How about this?

local final = P'b' * P(-1)
local patt =  (S'ab' - final)^0 * final

The pattern final is what we need at the end of the string.

The pattern patt matches the set 'ab' unless it is followed by the final sequence. Then it asserts that we have the final sequence. That stops the final 'b' from being eaten.

This doesn't guarantee that we get any a's (but neither would the pattern in the question have).

Upvotes: 0

Paul Kulchenko
Paul Kulchenko

Reputation: 26744

Yes, there is no backtracking, so you've correctly identified the problem. I think the solution is to list the valid postfix_op expressions; I'd change V'func_call_op' + V'index_op' to V'func_call_op'^0 * V'index_op' and also change the final V'func_call_op' to V'func_call_op'^1 to allow several function calls at the end.

Update: as suggested in the comments, the solution to the a/b problem would be (P'b'^0 * P'a')^0 * P'b'^1.

Upvotes: 1

Related Questions