Reputation: 463
I basically have a file with this structure:
root \
{
field1 {
subfield_a {
"value1"
}
subfield_b {
"value2"
}
subfield_c {
"value1"
"value2"
"value3"
}
subfield_d {
}
}
field2 {
subfield_a {
"value1"
}
subfield_b {
"value1"
}
subfield_c {
"value1"
"value2"
"value3"
"value4"
"value5"
}
subfield_d {
}
}
}
I want to parse this file with python to get a multidimensional array that contains all the values of a specific subfield (for examples subfield_c). E.g. :
tmp = magic_parse_function("subfield_c",file)
print tmp[0] # [ "value1", "value2", "value3"]
print tmp[1] # [ "value1", "value2", "value3", "value4", "value5"]
I'm pretty sure I've to use the pyparsing class, but I don't where to start to set the regex (?) expression. Can someone give me some pointers ?
Upvotes: 0
Views: 1127
Reputation: 63719
You can let pyparsing take care of the matching and iterating over the input, just define what you want it to match, and pass it the body of the file as a string:
def magic_parse_function(fld_name, source):
from pyparsing import Keyword, nestedExpr
# define parser
parser = Keyword(fld_name).suppress() + nestedExpr('{','}')("content")
# search input string for matching keyword and following braced content
matches = parser.searchString(source)
# remove quotation marks
return [[qs.strip('"') for qs in r[0].asList()] for r in matches]
# read content of file into a string 'file_body' and pass it to the function
tmp = magic_parse_function("subfield_c",file_body)
print(tmp[0])
print(tmp[1])
prints:
['value1', 'value2', 'value3']
['value1', 'value2', 'value3', 'value4', 'value5']
Upvotes: 1