Reputation: 51
I'm trying to use pyPEG2 to translate MoinMoin markup to Markdown, and I need to pay attention to newlines in certain cases. However, I can't even get my newline parsing tests to work. I'm new to pyPEG and my Python is rusty. Please bear with me.
Here's the code:
#!/usr/local/bin/python3
from pypeg2 import *
import re
class Newline(List):
grammar = re.compile(r'\n')
parse("\n", Newline)
parse("""
""", Newline)
This results in:
Traceback (most recent call last):
File "./pyPegNewlineTest.py", line 7, in <module>
parse("\n", Newline)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 667, in parse
t, r = parser.parse(text, thing)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 794, in parse
raise r
File "<string>", line 2
^
SyntaxError: expecting match on \n
It's as if pypeg is inserting an empty line after the \n.
Trying other options such as
grammar = re.compile(r'\n', re.MULTILINE)
grammar = re.compile(r'\r\n|\r|\n', re.MULTILINE)
grammar = contiguous(re.compile(r'\r\n|\r|\n', re.MULTILINE))
and various combinations of those don't change the error message (although I don't think I tried all combinations). Changing Newline
to subclass str
instead of List
doesn't change the error either.
Update
I have figured out that pypeg is stripping the newline before parsing it:
#!/usr/local/bin/python3
from pypeg2 import *
import re
class Newline(str):
grammar = contiguous(re.compile(r'a'))
parse("\na", Newline)
parse("""
a""", Newline)
print("Success, of a sort.")
Running this results in:
Success, of a sort.
If I override the Newline
's parse
method I don't even see the newline. The first thing it gets is the "a". This is consistent with what I'm seeing elsewhere. pypeg strips all leading whitespace, even when you specify contiguous
.
So, that's what's happening. Not sure what to do about it.
Upvotes: 5
Views: 662
Reputation: 21
Yes by default pypeg remove the whitespaces including the newlines.
This is easly configurable by setting the optional whitespace
argument in the parse()
function, e.g. in:
parse("\na", Newline, whitespace=re.compile(r"[ \t\r]"))
Doing so spaces and tabs will still be skipped, but not newlines \n
.
With this example the parser now correctly find the syntax error:
SyntaxError: expecting match on a
Upvotes: 2