Reputation: 9809
I have a string in the following format:
"2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:....."
This needs to be converted into a dictionary by splitting at the \r\n.
However,the difficult part is that fact that for the pairs between 3A and 4A,the key needs to be pre-pended by 3A,to make it apparent that they are a sub-set of 3A.
So the final expected output is as follows:
{'2A':'xxxx','3A':'yyyy','3A-51':'yzzzz','3A-52':'yzyeys','4A':'.....}
Is there any easier way than to extract all the data into a dictionary and iterating through the dict later with a for loop. Can this be done in a single parse in-process?
Upvotes: 0
Views: 112
Reputation: 14209
With the reduce
function you can keep memory while iterating and then succeed with a one-liner:
>>> import re
>>> reduce(lambda col, x: x + [y if re.match(r'\d+A.*', y) else col[-1][0:2] + '-' + y], s.split('\r\n'), [])
['2A:xxx', '3A:yyyy', '3A-51:yzzzz', '3A-52:yzyeys', '4A:.....']
As Martin says, the split
function splits the string into parts, and reduce
gathers the collection being populated and the new element. So you can have a look at the last element added (x[-1]
) to get its identifier.
Upvotes: 0
Reputation: 250911
def solve(strs):
dic = {}
prev = None
for x in strs.splitlines():
key,val = x.split(":")
if "A" not in key: #or key.isdigit()
new_key = "-".join((prev,key))
dic[new_key] = val
else:
dic[key] = val
prev = key
return dic
strs = "2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:"
print solve(strs)
output:
{'3A-52': 'yzyeys', '3A': 'yyyy', '3A-51': 'yzzzz', '2A': 'xxx', '4A': ''}
Upvotes: 0
Reputation: 214949
Off the top of my head:
dct = {}
last = ''
for line in s.splitlines():
key, val = line.split(':')
if key.isdigit():
key = last + '-' + key
else:
last = key
dct[key] = val
This works, but having "compound" keys is generally not the best way to work with hierarchical structures. I'd suggest something like this instead:
dct = {}
last = ''
for line in s.splitlines():
key, val = line.split(':')
if key.isdigit():
dct[last].setdefault('items', {})[key] = {'value': val }
else:
dct[key] = {'value': val }
last = key
This makes a dict like:
{'2A': {'value': 'xxx'},
'3A': {'items': {'51': {'value': 'yzzzz'}, '52': {'value': 'yzyeys'}},
'value': 'yyyy'},
'4A': {'value': '.....'}}
Looks more complicated, but actually it would be much easier to work with.
Upvotes: 1
Reputation: 1121534
str.splitlines()
does most of the work for you:
>>> "2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:.....".splitlines()
['2A:xxx', '3A:yyyy', '51:yzzzz', '52:yzyeys', '4A:.....']
The tricky bit here is tracking the 3A
key; presumably it's the A
in the key that defines the hierarchy.
It's best to split that out to a generator:
def hierarchy_key_values(lines):
parent = ''
for line in lines:
key, value = line.split(':', 1)
if key[-1] == 'A':
parent = key + '-'
else:
key = parent + key
yield key, value
The rest is easy:
your_dict = dict(hierarchy_key_values(input_text.splitlines()))
Demo with your example input:
>>> dict(hierarchy_key_values(input_text.splitlines()))
{'3A-52': 'yzyeys', '3A': 'yyyy', '3A-51': 'yzzzz', '2A': 'xxx', '4A': '.....'}
Upvotes: 1