kdheepak
kdheepak

Reputation: 1366

Multiline PyParsing example

I'm trying to parse something really simple in PyParsing that is multiline but I'm struggling understand why it doesn't work. The string I want to parse is as follows.

string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

I know that every line between the START and END tokens will contain one or more positive / negative numbers that can be int or float types. I also expect that a user may optionally add additional meta data after an % sign.

So I start by defining the basic grammar for Floats and Names.

Float = Word(nums + '.' + '-')
Name = Word(alphanums)

I know that a line can contain one or more Float followed by a semi-colon, and optionally by a % Name.

Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())

I expect many lines, so I can define the grammar for Lines as follows.

Lines = OneOrMore(Group(Line))

I use Group as suggested by Paul in this answer to make retrieving possible.

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))

grammar.parseString(string)

However this throws an error that says the following

ParseException: Expected end of line (at char 62), (line:3, col:19)

Full code below for easier copy and pasting.

string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)

Edit :

I've tried the following to no avail either.

string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
                                                            Suppress(Literal('%'))
                                                            + OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)

The only thing that does seem to work is if I use restOfLine

Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(restOfLine)

However, this does not return the portion after the semi-colon in a structured fashion and I have to parse it separately again. Is that the recommended approach?

Upvotes: 0

Views: 3401

Answers (1)

kdheepak
kdheepak

Reputation: 1366

Removing new lines from the default whitespace characters is what was needed to solve this. As Paul suggested in his comment, other improvements can be made to ensure that it parses floats and names more strictly.

string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group, ParserElement

ParserElement.setDefaultWhitespaceChars(" \t")

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
                                                            Suppress(Literal('%'))
                                                            + OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)

Upvotes: 3

Related Questions