Reputation: 16339
I'm looking for some advice on how to create a data structure by parsing a file. This is the list i have in my file.
'01bpar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
'01bpar( 3)= 0.00000000E+00',
'02epar( 1)= 0.49998963E+02',
'02epar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
'02epar( 3)= 0.00000000E+00',
'02epar( 4)= 0.17862340E-01 half_life= 0.3880495E+02 relax_time= 0.5598371E+02',
'02bpar( 1)= 0.49998962E+02',
'02bpar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
What I need to do is construct a data structure which chould look like this:
http://img11.imageshack.us/img11/7645/datastructure.gif
(couldn't post it becouse of new user restriction)
I've managed to get all the regexp filters to get what is needed, but i fail to construct the structure. Ideas?
Upvotes: 0
Views: 227
Reputation: 11531
Consider using a dict of dicts.
#!/usr/bin/env python
import re
import pprint
raw = """'01bpar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
'01bpar( 3)= 0.00000000E+00',
'02epar( 1)= 0.49998963E+02',
'02epar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
'02epar( 3)= 0.00000000E+00',
'02epar( 4)= 0.17862340E-01 half_life= 0.3880495E+02 relax_time= 0.5598371E+02',
'02bpar( 1)= 0.49998962E+02',
'02bpar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',"""
datastruct = {}
pattern = re.compile(r"""\D(?P<digits>\d+)(?P<field>[eb]par)[^=]+=\D+(?P<number>\d+\.\d+E[+-]\d+)""")
for line in raw.splitlines():
result = pattern.search(line)
parts = result.groupdict()
if not parts['digits'] in datastruct:
datastruct[parts['digits']] = {'epar':[], 'bpar':[]}
datastruct[parts['digits']][parts['field']].append(parts['number'])
pprint.pprint(datastruct, depth=4)
Produces:
{'01': {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []},
'02': {'bpar': ['0.49998962E+02', '0.23103878E-01'],
'epar': ['0.49998963E+02',
'0.23103878E-01',
'0.00000000E+00',
'0.17862340E-01']}}
Revised version in light of comments:
pattern = re.compile(r"""\D(?P<digits>\d+)(?P<field>[eb]par)[^=]+=\D+(?P<number>\d+\.\d+E[+-]\d+)""")
default = lambda : dict((('epar',[]), ('bpar',[])))
datastruct = defaultdict( default)
for line in raw.splitlines():
result = pattern.search(line)
parts = result.groupdict()
datastruct[parts['digits']][parts['field']].append(parts['number'])
pprint.pprint(datastruct.items())
which produces:
[('02',
{'bpar': ['0.49998962E+02', '0.23103878E-01'],
'epar': ['0.49998963E+02',
'0.23103878E-01',
'0.00000000E+00',
'0.17862340E-01']}),
('01', {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []})]
Upvotes: 1
Reputation: 63709
It's theoretically possible to have pyparsing create the whole structure using parse actions, but if you just name the various fields as I have below, building up the structure is not too bad. And if you want to convert to using RE's, this example should give you a start on how things might look:
source = """\
'01bpar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
'01bpar( 3)= 0.00000000E+00',
'02epar( 1)= 0.49998963E+02',
'02epar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02',
'02epar( 3)= 0.00000000E+00',
'02epar( 4)= 0.17862340E-01 half_life= 0.3880495E+02 relax_time= 0.5598371E+02',
'02bpar( 1)= 0.49998962E+02',
'02bpar( 2)= 0.23103878E-01 half_life= 0.3000133E+02 relax_time= 0.4328278E+02', """
from pyparsing import Literal, Regex, Word, alphas, nums, oneOf, OneOrMore, quotedString, removeQuotes
EQ = Literal('=').suppress()
scinotationnum = Regex(r'\d\.\d+E[+-]\d+')
dataname = Word(alphas+'_')
key = Word(nums,exact=2) + oneOf("bpar epar")
index = '(' + Word(nums) + ')'
keyedValue = key + EQ + scinotationnum
# define an item in the source - suppress values with keys, just want the unkeyed ones
item = key('key') + index + EQ + OneOrMore(keyedValue.suppress() | scinotationnum)('data')
# initialize summary structure
from collections import defaultdict
results = defaultdict(lambda : {'epar':[], 'bpar':[]})
# extract quoted strings from list
quotedString.setParseAction(removeQuotes)
for raw in quotedString.searchString(source):
parts = item.parseString(raw[0])
num,par = parts.key
results[num][par].extend(parts.data)
# dump out results, or do whatever
from pprint import pprint
pprint(dict(results.iteritems()))
Prints:
{'01': {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []},
'02': {'bpar': ['0.49998962E+02', '0.23103878E-01'],
'epar': ['0.49998963E+02',
'0.23103878E-01',
'0.00000000E+00',
'0.17862340E-01']}}
Upvotes: 3
Reputation: 14900
Your top level structure is positional, so it's a perfect choice for a list. Since lists can hold arbitrary items, then a named tuple is perfect. Each item in the tuple can hold a list with it's elements.
So, your code should look something like this pseudocode:
from collections import named tuple
data = []
newTuple = namedtuple('stuff', ['epar','bpar'])
for line in theFile.readlines():
eparVals = regexToGetThemFromString()
bparVals = regexToGetThemFromString()
t = newTuple(eparVals, bparVals)
data.append(t)
You said you could already loop over the file, and had various regex to get the data, so I didn't bother adding all the details.
Upvotes: 0