PyParsing: shell style space escape using backslash

Question

i have a need to parse text that is a space delimited key value pair in the form of

= = ...

which is pretty straight forward with pyparsing.. except when the values can have spaces in them eg.

dog=blue cat="orange tangerine" mouse=a\ small\ grey\ mouse

what would a pyparsing grammar look like for the last pair pyparsing is greedy on spaces.. it's further complicated by line spanning text which may look like

dog=blue cat="orange tangerine" mouse=a\ small\ grey\ mouse \
   lion=nonexistent

I looked at a few examples at http://pyparsing.wikispaces.com/share/view/7002417 and Python/Pyparsing - Multiline quotes which helped with multi-line text but not with backslash-escaped-space

jedwards · Accepted Answer

Assuming your input strings are in a file called "input.py", the following works for your examples:

import pyparsing
from pyparsing import ZeroOrMore, Group


OP_EQ   = pyparsing.Literal('=').suppress()
DQUOTE  = pyparsing.Literal('"').suppress()
ESPACE  = pyparsing.Literal('\ ').suppress().leaveWhitespace()
BSLASH  = pyparsing.Literal('\')

S       = pyparsing.Word(" 	
").suppress().leaveWhitespace()

DELIM   = ZeroOrMore(S ^ BSLASH).suppress()

KEY     = pyparsing.Word(pyparsing.alphanums)("KEY")

VALTOK  = pyparsing.Word(pyparsing.printables, excludeChars='="\')

QVALUE  = ( DQUOTE +
            Group(VALTOK + ZeroOrMore(S + VALTOK)) +
            DQUOTE
            )
NQVALUE = Group(VALTOK + ZeroOrMore(ESPACE + VALTOK))
VALUE   = (NQVALUE ^ QVALUE)("VALUE")

PAIR    = Group(KEY + OP_EQ + VALUE)("PAIR")

PAIRS   = (PAIR + ZeroOrMore(DELIM + PAIR))

with open('input.txt') as f:
    lines = f.read()

res = PAIRS.parseString(lines, parseAll=True)

for (k,v) in res:
    print('{} = "{}"'.format(k, ' '.join(v)))

Output:

dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
lion = "nonexistent"

And as XML, for reference:


  
    dog
    
      blue
    
  
  
    cat
    
      orange
      tangerine
    
  
  
    mouse
    
      a
      small
      grey
      mouse
    
  
  
    dog
    
      blue
    
  
  
    cat
    
      orange
      tangerine
    
  
  
    mouse
    
      a
      small
      grey
      mouse
    
  
  
    lion
    
      nonexistent

Edit: FWIW, you could do this in regex:

import re

with open('input.txt') as f:
    lines = f.read()

mat = re.sub(r'=([^"]\w*(?:(?:\ )\w*)*)', r'="\1"', lines)  # Quote unquoted values
mat = mat.replace("\ "," ").replace("\
","")              # Replace escaped spaces
mat = re.findall(r'(\w*)="(.*?)"', mat)                      # Extract pairs
for (k,v) in mat:                                            # Print pairs
    print('{} = "{}"'.format(k, v))

Output:

dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
dog = "blue"
cat = "orange tangerine"
mouse = "a small grey mouse"
lion = "nonexistent"

PyParsing: shell style space escape using backslash

Answers (1)

Related Questions