mottosson
mottosson

Reputation: 3773

PEG.js Get any text between ( and );

I'm trying to catch some text between parathesis with a semicolon in the end.

Example: (in here there can be 'anything' !"#¤);); any character is possible);

I've tried this:

Text
 = "(" text:(.*) ");" { return text.join(""); }

But it seems (.*) will include the last ); before ");" does and I get the error:

Expected ");" or any character but end of input found

The problem is that the text can contain ");" so I want the outer most ); to descide when the line ends.

This regex \((.*)\); does what I want, but how can I do the same in PEG.js? I don't want to include the outer parentheses and semicolon in the result.

This seems like it should be quite easy if you know what you're doing =P

Upvotes: 5

Views: 2178

Answers (1)

paulotorrens
paulotorrens

Reputation: 2321

So, the point is that a PEG is deterministic, while a regex is not. So a PEG won't backtrack once it's accepted some input. We can then simulate the semantics you want. Since you say the regex \((.*)\); does what you want, we might translate this to a PEG.

What does this regex do? It consumes all characters up to the end of the input, then keeps backtracking until it sees a );, i.e., it consumes the last possible );.

To make this work with a PEG, we might use a lookahead to keep consuming iff we have a ); ahead.

So, a solution is:

Text
 = "(" text:TextUntilTerminator ");" { return text.join(""); }

TextUntilTerminator
 = x:(&HaveTerminatorAhead .)* { return x.map(y => y[1]) }

HaveTerminatorAhead
 = . (!");" .)* ");"

The TextUntilTerminator non-terminal consumes while HaveTerminatorAhead matches without consuming it (a lookahead, the & symbol). Then it consumes one single character. It does so until it knows we've reached the final ); on the input.

The HaveTerminalAhead non-terminal is simple: it verifies if there is one character ahead, and, if it does, garantees that there is at least one ); after it. We also use the negative-lookahead ! to stop at the first ); we see (avoid consuming it, which would reproduce your original problem).

This PEG, then, reproduces the behavior of the regex you suggested.

Upvotes: 14

Related Questions