Reputation: 817
I'm working on a Peg parser. Among other structures, it needs to parse a tag directive. A tag can contain any character. If you want the tag to include a curly brace }
you can escape it with a backslash. If you need a literal backslash, that should also be escaped. I tried to implement this inspired by the Peg grammer for JSON: https://github.com/pegjs/pegjs/blob/master/examples/json.pegjs
There are two problems:
{ some characters but escape with a \\ }
\}
. Example input:{ some characters but escape \} with a \\ }
The relevant grammer is:
Tag
= "{" _ tagContent:$(TagChar+) _ "}" {
return { type: "tag", content: tagContent }
}
TagChar
= [^\}\r\n]
/ Escape
sequence:(
"\\" { return {type: "char", char: "\\"}; }
/ "}" { return {type: "char", char: "\x7d"}; }
)
{ return sequence; }
_ "whitespace"
= [ \t\n\r]*
Escape
= "\\"
You can easily test grammar and test input with the online PegJS sandbox: https://pegjs.org/online
I hope somebody has an idea to resolve this.
Upvotes: 2
Views: 775
Reputation: 817
For reference, the correct grammer is as follows:
Tag
= "{" _ tagContent:TagChar+ _ "}" {
return { type: "tag", content: tagContent.map(c => c.char || c).join('') }
}
TagChar
= [^}\\\r\n]
/ Escape
sequence:(
"\\" { return {type: "char", char: "\\"}; }
/ "}" { return {type: "char", char: "\x7d"}; }
)
{ return sequence; }
_ "whitespace"
= [ \t\n\r]*
Escape
= "\\"
When using the following input:
{ some characters but escape \} with a \\ }
it will return:
{
"type": "tag",
"content": "some characters but escape } with a \ "
}
Upvotes: 1
Reputation: 241721
These errors are both basically typos.
The first problem is the character class in your regular expression for tag characters. In a character class, \
continues to be an escape character, so [^\}\r\n]
matches any character other than }
(written with an unnecessary backslash escape), carriage return or newline. \
is such a character, so it's matched by the character class, and Escape
is never tried.
Since your pattern for tag characters doesn't succeed in recognising \
as an Escape
, the tag { \\ }
is parsed as four characters (space, backslash, backslash, space) and the tag { \} }
is parsed as terminating on the first }
, creating a syntax error.
So you should fix the character class to [^}\\\r\n]
(I put the closing brace first in order to make it easier to read the falling timber. The order is irrelevant.)
Once you do that, you'll find that the parser still returns the string with the backslashes intact. That's because of the $
in your Tag
pattern: "{" _ tagContent:$(TagChar+) _ "}"
. According to the documentation, the meaning of the $
operator is: (emphasis added)
$ expression
Try to match the expression. If the match succeeds, return the matched text instead of the match result.
Upvotes: 3