IUnknown
IUnknown

Reputation: 9809

string processing while parsing into dictionary

I have a string in the following format:

"2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:....."

This needs to be converted into a dictionary by splitting at the \r\n.
However,the difficult part is that fact that for the pairs between 3A and 4A,the key needs to be pre-pended by 3A,to make it apparent that they are a sub-set of 3A.
So the final expected output is as follows:

{'2A':'xxxx','3A':'yyyy','3A-51':'yzzzz','3A-52':'yzyeys','4A':'.....}

Is there any easier way than to extract all the data into a dictionary and iterating through the dict later with a for loop. Can this be done in a single parse in-process?

Upvotes: 0

Views: 112

Answers (4)

Emmanuel
Emmanuel

Reputation: 14209

With the reduce function you can keep memory while iterating and then succeed with a one-liner:

>>> import re
>>> reduce(lambda col, x: x + [y if re.match(r'\d+A.*', y) else col[-1][0:2] + '-' + y], s.split('\r\n'), [])
['2A:xxx', '3A:yyyy', '3A-51:yzzzz', '3A-52:yzyeys', '4A:.....']

As Martin says, the split function splits the string into parts, and reduce gathers the collection being populated and the new element. So you can have a look at the last element added (x[-1]) to get its identifier.

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250911

def solve(strs):
    dic = {}
    prev = None
    for x in strs.splitlines():
        key,val = x.split(":")
        if "A" not in key:                #or key.isdigit()
            new_key = "-".join((prev,key))
            dic[new_key] = val
        else:
            dic[key] = val
            prev = key
    return dic
strs = "2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:"
print solve(strs)    

output:

{'3A-52': 'yzyeys', '3A': 'yyyy', '3A-51': 'yzzzz', '2A': 'xxx', '4A': ''}

Upvotes: 0

georg
georg

Reputation: 214949

Off the top of my head:

 dct = {}
 last = ''
 for line in s.splitlines():
    key, val = line.split(':')
    if key.isdigit():
        key = last + '-' + key
     else:
        last = key
     dct[key] = val

This works, but having "compound" keys is generally not the best way to work with hierarchical structures. I'd suggest something like this instead:

dct = {}
last = ''
for line in s.splitlines():
    key, val = line.split(':')
    if key.isdigit():
        dct[last].setdefault('items', {})[key] = {'value': val }
    else:
        dct[key] = {'value': val }
        last = key

This makes a dict like:

{'2A': {'value': 'xxx'},
 '3A': {'items': {'51': {'value': 'yzzzz'}, '52': {'value': 'yzyeys'}},
        'value': 'yyyy'},
 '4A': {'value': '.....'}}

Looks more complicated, but actually it would be much easier to work with.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121534

str.splitlines() does most of the work for you:

>>> "2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:.....".splitlines()
['2A:xxx', '3A:yyyy', '51:yzzzz', '52:yzyeys', '4A:.....']

The tricky bit here is tracking the 3A key; presumably it's the A in the key that defines the hierarchy.

It's best to split that out to a generator:

def hierarchy_key_values(lines):
    parent = ''
    for line in lines:
        key, value = line.split(':', 1)
        if key[-1] == 'A':
            parent = key + '-'
        else:
            key = parent + key

        yield key, value

The rest is easy:

your_dict = dict(hierarchy_key_values(input_text.splitlines()))

Demo with your example input:

>>> dict(hierarchy_key_values(input_text.splitlines()))
{'3A-52': 'yzyeys', '3A': 'yyyy', '3A-51': 'yzzzz', '2A': 'xxx', '4A': '.....'}

Upvotes: 1

Related Questions