Hooked
Hooked

Reputation: 88238

Python parsing tree-like data

I have a glyph information from a font that looks like this:

(CHARACTER C T
   (CHARWD R 0.6944475)
   (CHARHT R 0.686111)
   (COMMENT
      (KRN C y R -0.027779)
      (KRN C e R -0.083334)
      (KRN C o R -0.083334)
      (KRN C r R -0.083334)
      (KRN C a R -0.083334)
      (KRN C A R -0.083334)
      (KRN C u R -0.083334)
      )
   )

Is there a straightforward way to parse this in python? I've used BeautifulSoup before, but it requires nested <tag> </tag> like information. It wouldn't be to hard to convert this to XML and back again - but it seems like it would be reinventing the wheel. How would I get this information into a data object that I can manipulate and spit back out again?

Upvotes: 1

Views: 1114

Answers (2)

Adam Wagner
Adam Wagner

Reputation: 16117

You could use pyparsing. Your example looks very much like an s-expression, and they have an s-expression parser in their examples section: http://pyparsing.wikispaces.com/file/view/sexpParser.py

Upvotes: 6

Sean McCully
Sean McCully

Reputation: 1132

This will convert your data into a python data structure. Not sure if it's what you're looking for?

s = """(CHARACTER C T
       (CHARWD R 0.6944475)
       (CHARHT R 0.686111)
           (COMMENT
           (KRN C y R -0.027779)
           (KRN C e R -0.083334)
           (KRN C o R -0.083334)
           (KRN C r R -0.083334)
           (KRN C a R -0.083334)
           (KRN C A R -0.083334)
           (KRN C u R -0.083334)
           )
        )"""

s = re.sub("\)", "\),", s)
t = re.sub('([(,\s])(\w+)', '\\1"\\2",', s)
eval(t[:-1].replace('\\', ''))

Upvotes: 2

Related Questions