user1625344
user1625344

Reputation: 313

pyparsing nested assignment

I have a group of 500-600 files I want to search thru and extract data. I'm trying to use pyparsing with very limited success. There are only 3 things in a file (1) comments, (2) simple assignments and (3) nested assignments. The nesting gets about 6 levels deep.

My goal is to look at a particular value in a 3 level deep field and if it has a particular value, pull out a value from another 3rd level field that is part of the same 2nd level field.

First, is pyparsing the proper tool for doing this? Other recommendations if not?

I know how to build a list of files and iterate over them. Let me show a sample file and then the code I'm trying.

# TOP_OBJECT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TOP_OBJECT=
(
    obj_fmt=
    (
    obj_name="foo"
    obj_cre_date=737785182  # = Tue May 18 23:19:42 1993
    opj_data=
    (
        a="continue"
        b="quit"
    )
    obj_version=264192  # = Version 4.8.0
    )

# LEVEL1_OBJECT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    LEVEL1_OBJECT=
    (
        OBJ_part=
        (
        obj_type=1005
        obj_size=120
        )

# LEVEL2_OBJECT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        LEVEL2_OBJECT_A=
        (
            OBJ_part=
            (
            obj_type=3001
            obj_size=128
            )

            Another_part=
            (
                another_attr=
                (
                another_style=0
                another_param=2
                )
            )
        ) ### End of LEVEL2_OBJECT_A ###
# LEVEL2_OBJECT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        LEVEL2_OBJECT_B=
        (
            OBJ_part=
            (
            obj_type=3005
            obj_size=128
            )

            Another_part=
            (
                another_attr=
                (
                another_style=0
                another_param=8
                )
            )
        ) ### End of LEVEL2_OBJECT_B ###
    ) ### End of LEVEL1 OBJECT
) ### End of TOP_OBJECT ###

My code to digest the file looks like this:

from pyparsing import *

def Syntax():
    comment = Group("#" + restOfLine).suppress()
    eq = Literal('=')
    lpar  = Literal( '(' ).suppress()
    rpar  = Literal( ')' ).suppress()
    num = Word(nums)
    var = Word(alphas + "_")
    simpleAssign = var +  eq
    nestedAssign = Group(lpar + OneOrMore(simpleAssign) + rpar)
    expr = Forward()
    atom = nestedAssign | simpleAssign
    expr << atom 
    expr.ignore(comment)
    return expr

def main():

    expr = Syntax()
    results = expr.parseFile( "for_show.asc" )
    print results

if __name__ == '__main__':
    main()

My results don't descend: ['TOP_OBJECT', '=']

Right now I'm not handling quoted strings or numbers, just trying to understand parsing nested lists.

Upvotes: 1

Views: 258

Answers (1)

PaulMcG
PaulMcG

Reputation: 63762

Mostly there, just a few gaps in your parser - see the commented-out original code, compared to the current code:

def Syntax():
    comment = Group("#" + restOfLine).suppress()
    eq = Literal('=')
    lpar  = Literal( '(' ).suppress()
    rpar  = Literal( ')' ).suppress()
    num = Word(nums)
    #~ var = Word(alphas + "_")
    var = Word(alphas + "_", alphanums+"_")
    #~ simpleAssign = var +  eq
    expr = Forward()
    simpleAssign = var +  eq + (num | quotedString)
    #~ nestedAssign = Group(lpar + OneOrMore(simpleAssign) + rpar)
    nestedAssign = var +  eq + Group(lpar + OneOrMore(expr) + rpar)
    atom = nestedAssign | simpleAssign
    expr << atom 
    expr.ignore(comment)
    return expr

This gives:

['TOP_OBJECT',
 '=',
 ['obj_fmt',
  '=',
  ['obj_name',
   '=',
   '"foo"',
   'obj_cre_date',
   '=',
   '737785182',
   'opj_data',
   '=',
   ['a', '=', '"continue"', 'b', '=', '"quit"'],
   'obj_version',
   '=',
   '264192'],
  'LEVEL1_OBJECT',
  '=',
  ['OBJ_part',
   '=',
   ['obj_type', '=', '1005', 'obj_size', '=', '120'],
   'LEVEL2_OBJECT_A',
   '=',
   ['OBJ_part',
    '=',
    ['obj_type', '=', '3001', 'obj_size', '=', '128'],
    'Another_part',
    '=',
    ['another_attr',
     '=',
     ['another_style', '=', '0', 'another_param', '=', '2']]],
   'LEVEL2_OBJECT_B',
   '=',
   ['OBJ_part',
    '=',
    ['obj_type', '=', '3005', 'obj_size', '=', '128'],
    'Another_part',
    '=',
    ['another_attr',
     '=',
     ['another_style', '=', '0', 'another_param', '=', '8']]]]]]

If you wrap the expr inside nestedAssign's OneOrMore with Group

    nestedAssign = var +  eq + Group(lpar + OneOrMore(Group(expr)) + rpar)

, I think you'll get better structure for your repeated nested assignments:

['TOP_OBJECT',
 '=',
 [['obj_fmt',
   '=',
   [['obj_name', '=', '"foo"'],
    ['obj_cre_date', '=', '737785182'],
    ['opj_data', '=', [['a', '=', '"continue"'], ['b', '=', '"quit"']]],
    ['obj_version', '=', '264192']]],
  ['LEVEL1_OBJECT',
   '=',
   [['OBJ_part',
     '=',
     [['obj_type', '=', '1005'], ['obj_size', '=', '120']]],
    ['LEVEL2_OBJECT_A',
     '=',
     [['OBJ_part',
       '=',
       [['obj_type', '=', '3001'], ['obj_size', '=', '128']]],
      ['Another_part',
       '=',
       [['another_attr',
         '=',
         [['another_style', '=', '0'], ['another_param', '=', '2']]]]]]],
    ['LEVEL2_OBJECT_B',
     '=',
     [['OBJ_part',
       '=',
       [['obj_type', '=', '3005'], ['obj_size', '=', '128']]],
      ['Another_part',
       '=',
       [['another_attr',
         '=',
         [['another_style', '=', '0'], ['another_param', '=', '8']]]]]]]]]]]

Also, your originally posted code contained TABs, I find them to be more trouble than they are worth, better off using 4-space indents.

Upvotes: 1

Related Questions