user1130
user1130

Reputation: 1

PyParsing Parse nested loop with brace and specific header

I found several topics about pyparsing. They are dealing with almost the same problem in parsing nested loop, but even with that, i can't find a solution to my errors.

I have the following format :

key value;
header_name "optional_metadata"
{
     key value;
     sub_header_name
     {
        key value;
     };
};
key value;

I used the following parser:

VALID_KEY_CHARACTERS = alphanums
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")

lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress() + Literal(";").suppress()

expr = Forward()
atom = Word(VALID_KEY_CHARACTERS) + Optional(Word(VALID_VALUE_CHARACTERS))
pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )

When i use it, i got only the "header_name" and "header_metadata", i modified it, and i got only key/value inside a brace, python exception is triggered to show a parsing error (it expects '}' when reaching the sub_header_name.

Anyone can help me to understand why ? Thank you.

Upvotes: 2

Views: 1123

Answers (2)

antigenius
antigenius

Reputation: 131

I was trying to parse terraform resources using python and hit the same problem as you.

here is the gist for my parser

the test case file "repository.tf" is the one you can see how the parser are able to parse nested braces with specific header

https://gist.github.com/antigenius0910/5e00e80cfadf48642acb44132acefb3a#file-parse-py-L95-L101

~/Downloads/5e00e80cfadf48642acb44132acefb3a-b514369c817885589911ca2c81fa367af4851d86 ᐅ python parse.py 

resource "github_repository" "tfer--test-002D-plugin-002D-example" {
  allow_merge_commit     = "true"
  allow_rebase_merge     = "true"
  allow_squash_merge     = "true"
  archived               = "true"
  default_branch         = "main"
  delete_branch_on_merge = "false"
  has_downloads          = "true"
  has_issues             = "true"
  has_projects           = "false"
  has_wiki               = "true"
  is_template            = "false"
  name                   = "test-plugin-example"
  private                = "true"

  template {
    owner      = "test"
    repository = "test-plugin-templattest-plugin-template"
  }

  visibility           = "internal"
  vulnerability_alerts = "false"
}

Hope this will help a little:)

Upvotes: 0

Michael0x2a
Michael0x2a

Reputation: 64068

I think that the main problem is that your grammar does not fully describe the input, leading to several mismatches. The two main problems I saw was that you forgot that each of your key-pair values must end in a semicolon and did not specify that a key-pair value can come after a closing curly brace. It also looks like the lines:

pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )

...would require each set of curly braces to contain, at minimum, two key-pair values or a key-pair value and a set of curly braces. I believe this would cause an error once you encounter the lines:

{
    key value;
};

...within your input, though I'm not entirely certain.

In any case, after playing around with your grammar, I ended up with this:

from pyparsing import *

data = """key1 value1; 
header_name "optional_metadata"
{
     key2 value2;
     sub_header_name
     {
        key value;
     };
};
key3 value3;"""

# I'm reusing the key characters for the header names, which can contain a semicolon
VALID_KEY_CHARACTERS = srange("[a-zA-Z0-9_]")
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")

semicolon = Literal(';').suppress()
lbr = Literal('{').suppress()
rbr = Literal('}').suppress()

key = Word(VALID_KEY_CHARACTERS)
value = Word(VALID_VALUE_CHARACTERS)

key_pair = Group(key + value + semicolon)("key_pair")
metadata = Group(key + Optional(value))("metadata")

header = key_pair + Optional(metadata)

expr = Forward()
contents = Group(lbr + expr + rbr + semicolon)("contents")
expr << header + Optional(contents) + Optional(key_pair)

print expr.parseString(data).asXML()

This results in the following output:

<key_pair>
  <key_pair>
    <ITEM>key1</ITEM>
    <ITEM>value1</ITEM>
  </key_pair>
  <metadata>
    <ITEM>header_name</ITEM>
    <ITEM>&quot;optional_metadata&quot;</ITEM>
  </metadata>
  <contents>
    <key_pair>
      <ITEM>key2</ITEM>
      <ITEM>value2</ITEM>
    </key_pair>
    <metadata>
      <ITEM>sub_header_name</ITEM>
    </metadata>
    <contents>
      <key_pair>
        <ITEM>key</ITEM>
        <ITEM>value</ITEM>
      </key_pair>
    </contents>
  </contents>
  <key_pair>
    <ITEM>key3</ITEM>
    <ITEM>value3</ITEM>
  </key_pair>
</key_pair>

I'm not entirely sure if this is exactly what you were trying to accomplish, hopefully it should be close enough that you can tweak it to suit your particular task.

Upvotes: 1

Related Questions