Reputation: 35
The syntax I would like to parse is of the following kind:
# This is a comment
# This is a block. It starts with \begin{} and ends with \end{}
\begin{document}
# Within the document block other kinds of blocks can exist, and within them yet other kinds.
# Comments can exist anywhere in the code.
This is another block within the block. It is a paragraph, no formal \begin{} and \end{} are needed.
The parser infers its type as a ParagraphBlock. The block ends with the newline.
\end{document}
I am learning how to use PEG, and this is what I have developed so far for the current syntax:
Start
= (Newline / Comment / DocumentBlock)*
Comment
= '#' value: (!Newline .)* Newline? {
return {
type: "comment",
value: value.map(y => y[1]).join('').trim()
}
}
Newline
= [\n\r\t]
DocumentBlock
= "\\begin\{document\}"
(!"\\end\{document\}" DocumentChildren)*
"\\end\{document\}"
DocumentChildren
= NewlineBlock / ParagraphBlock
NewlineBlock
= value: Newline*
{
return {
type: "newline",
value: value.length
}
}
ParagraphBlock
= (!Newline .)* Newline
I am having some issues with infinite loops. The current code produces this error:
Line 19, column 5: Possible infinite loop when parsing (repetition used with an expression that may not consume any input).
What would be a correct implementation for the simple syntax above?
Upvotes: 1
Views: 760
Reputation: 4132
I think this is due to the NewlineBlock
rule using a kleene star on Newline
.
In DocumentBlock
you have a repeated DocumentChildren
. In NewlineBlock
you have a repeated Newline
which means that it can always return ''
, the null string, which would cause an infinite loop.
Changing the *
in NewlineBlock
to a +
would fix the problem. That way it no longer has the option of returning the null string.
NewlineBlock
= value: Newline+
{
return {
type: "newline",
value: value.length
}
}
Upvotes: 2