Reputation: 1366
I'm trying to parse something really simple in PyParsing that is multiline but I'm struggling understand why it doesn't work. The string I want to parse is as follows.
string = '''START
1 10; % Name1
2 20; % Name2
END'''
I know that every line between the START and END tokens will contain one or more positive / negative numbers that can be int
or float
types. I also expect that a user may optionally add additional meta data after an %
sign.
So I start by defining the basic grammar for Floats and Names.
Float = Word(nums + '.' + '-')
Name = Word(alphanums)
I know that a line can contain one or more Float
followed by a semi-colon, and optionally by a % Name
.
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
I expect many lines, so I can define the grammar for Lines as follows.
Lines = OneOrMore(Group(Line))
I use Group
as suggested by Paul in this answer to make retrieving possible.
grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
However this throws an error that says the following
ParseException: Expected end of line (at char 62), (line:3, col:19)
Full code below for easier copy and pasting.
string = '''START
1 10; % Name1
2 20; % Name2
END'''
from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group
Float = Word(nums + '.' + '-')
Name = Word(alphanums)
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
Lines = OneOrMore(Group(Line))
grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
Edit :
I've tried the following to no avail either.
string = '''START
1 10; % Name1
2 20; % Name2
END'''
from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group
Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
Suppress(Literal('%'))
+ OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))
grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
The only thing that does seem to work is if I use restOfLine
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(restOfLine)
However, this does not return the portion after the semi-colon in a structured fashion and I have to parse it separately again. Is that the recommended approach?
Upvotes: 0
Views: 3401
Reputation: 1366
Removing new lines from the default whitespace characters is what was needed to solve this. As Paul suggested in his comment, other improvements can be made to ensure that it parses floats and names more strictly.
string = '''START
1 10; % Name1
2 20; % Name2
END'''
from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group, ParserElement
ParserElement.setDefaultWhitespaceChars(" \t")
Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
Suppress(Literal('%'))
+ OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))
grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
Upvotes: 3