harshithavrao
harshithavrao

Reputation: 1

Convert multi-line slash-delimited string into a nested dictionary

abc/pqr123/xy2/yes//T  
abc/pqr245/kl3/yes//T  
abc/ijk123/op5/yes//T  
abc/pqr245/kl4/yes//T

These are the input values that I want to convert to a nested dictionary.Each value such as abc, pqr123, xy2, yes, T represents the name of a product.

My output should look something like this:

{"abc":{"pqr123":{"xy2":{"yes":{"T":[]}},"pqr245":"kl3":{"yes":{"T": 
[]}},"kl4":{"yes":{"T":[]}},"ijk123":{"op5":{"yes":{"T":[]}}}  

So I need a nested dictionary of all unique values and at the last key of the dictionary should have a value of empty list.

Below is my snippet of code that generates the output I require, but I want to do it more dynamically so it is best suited even if the length of the input grows or shrinks. Please do let me know if are any better solution for this problem.

data_dict={}
for item in meta_line.split(','):
    item = item.replace('//','/')
    item = str(item) 
    item = item.split('/')
    if item[0] == "":
       continue  

    if item[0] not in data_dict.keys():
       data_dict[item[0]] = {}
    if item[1] not in data_dict[item[0]].keys():
       data_dict[item[0]][item[1]] = {}
    if item[2] not in data_dict[item[0]][item[1]].keys():
       data_dict[item[0]][item[1]][item[2]] = {}
    if item[3] not in data_dict[item[0]][item[1]][item[2]].keys():
       data_dict[item[0]][item[1]][item[2]][item[3]] = {}
    if item[4] not in data_dict[item[0]][item[1]][item[2]][item[3]].keys():
       data_dict[item[0]][item[1]][item[2]][item[3]][item[4]] = []

Upvotes: 0

Views: 415

Answers (2)

PM 2Ring
PM 2Ring

Reputation: 55489

You can use The dict.setdefault method in a loop to build the nested dictionary. I'll use the pprint module to display the output. Note that pprint.pprint sorts dictionary keys before the output is computed.

from pprint import pprint

data = '''\
abc/pqr123/xy2/yes//T
abc/pqr245/kl3/yes//T
abc/ijk123/op5/yes//T
abc/pqr245/kl4/yes//T
'''.splitlines()

nested_dict = {}

for row in data:
    d = nested_dict
    keys = [s for s in row.split('/') if s]
    for key in keys[:-1]:
        d = d.setdefault(key, {})
    d[keys[-1]] = []

pprint(nested_dict)

output

{'abc': {'ijk123': {'op5': {'yes': {'T': []}}},
         'pqr123': {'xy2': {'yes': {'T': []}}},
         'pqr245': {'kl3': {'yes': {'T': []}}, 'kl4': {'yes': {'T': []}}}}}

Upvotes: 0

CJR
CJR

Reputation: 3985

You probably want something that's not dependent on so many massively nested brackets. This is a problem that using references to a mutable object will work well on.

meta_line = 'abc/pqr123/xy2/yes//T,abc/pqr245/kl3/yes//T,abc/ijk123/op5/yes//T,abc/pqr245/kl4/yes//T'

data = dict()
for item in meta_line.split(','):
    dref = data
    dict_tree = item.strip().replace('//', '/').split('/')
    for i, val in enumerate(dict_tree):
        if val in dref:
            pass
        elif i != len(dict_tree) - 1:
            dref[val] = dict()
        elif i == len(dict_tree) - 1:
           dref[val] = list()
        dref = dref[val]

Every iteration of the inner loop will move the reference dref down a level, and then reset it on every iteration of the outer loop. At the end, data should hold your nested dict.

Edit: Sorry, I just noticed that you wanted the last level to be a list. This is one solution to that problem, but isn't the best (it will create errors if there's a list in a spot that a later data entry wants to be a dict instead). I would probably choose to build my nested dict and then recursively replace any empty dicts with empty lists afterwards to avoid that problem.

Upvotes: 1

Related Questions