Moritz Wolff
Moritz Wolff

Reputation: 508

Convert a list of files to a tree-like dictionary

Suppose I have a list that looks like this:

list_all_files = [['folder1', 'subfolder1', 'file1'], 
                  ['folder1', 'subfolder1', 'file2'],
                  ['folder1', 'subfolder1', 'file3'],
                  ['folder1', 'subfolder1', 'file4'],
                  ['folder1', 'subfolder2', 'file1'],
                  ['folder1', 'subfolder2', 'file2'],
                  ['folder2', 'subfolder1', 'file1'],
                  ['folder2', 'subfolder1', 'file2'],
                  ['folder3', 'file1'],
                  ['folder3', 'file2'],
                  ['folder4', 'subfolder1', 'file1'],
                  ['folder4', 'subfolder1', 'file2'],
                  ['folder2', 'subfolder2', 'file1'],
                  ['folder2', 'subfolder2', 'file2'],
                  ['folder2', 'subfolder2', 'file3'],
                  ['folder2', 'subfolder2', 'file4']]

"list_all_files" is just an example - the list could also have zero or n folders and/or subfolders and/or files. How can I convert it to a dictionary that looks like the following?

dict_all_files =

{    'folder1': {'subfolder1': {'file1', 'file2', 'file3', 'file4'},
                 'subfolder2': {'file1', 'file2'}},
     'folder2': {'subfolder1': {'file1', 'file2'},
                 'subfolder2': {'file1', 'file2', 'file3', 'file4'}},
     'folder3': {'file1', 'file2'},
     'folder4': {'subfolder1': {'file1', 'file2'}}    }

I tried looping over the list and using dict.update(), starting like this:

dict_all_files = {}
for member in list_all_files:
    if member[0] == 'folder1':
        dict_all_files.update({'folder1': ''})
        for element in member:
            if member[1] == 'subfolder1':
                dict_all_files.update({folder1': member[1]})

But then I would overwrite folders and also I would have to write if statements for every folder and subfolder manually, which wouldn't be very practical. So it makes no sense working on my code because it is already flawed. Perhaps I'm thinking wrong from the start? Would be nice if anyone could provide an answer or at least a hint. I haven't found any questions answering this or a similar question.

Upvotes: 2

Views: 147

Answers (2)

Albin Paul
Albin Paul

Reputation: 3419

You can use dict.setdefault to clean your code.

import pprint
list_all_files = [['folder1', 'subfolder1', 'file1'], 
                  ['folder1', 'subfolder1', 'file2'],
                  ['folder1', 'subfolder1', 'file3'],
                  ['folder1', 'subfolder1', 'file4'],
                  ['folder1', 'subfolder2', 'file1'],
                  ['folder1', 'subfolder2', 'file2'],
                  ['folder2', 'subfolder1', 'file1'],
                  ['folder2', 'subfolder1', 'file2'],
                  ['folder3', 'file1'],
                  ['folder3', 'file2'],
                  ['folder4', 'subfolder1', 'file1'],
                  ['folder4', 'subfolder1', 'file2'],
                  ['folder2', 'subfolder2', 'file1'],
                  ['folder2', 'subfolder2', 'file2'],
                  ['folder2', 'subfolder2', 'file3'],
                  ['folder2', 'subfolder2', 'file4']]

result = {}
for path in list_all_files:
    head = result
    for name in path[:-2]:
        head = head.setdefault(name,{})
    head.setdefault(path[-2],set()).add(path[-1])

pprint.pprint(result)

OUTPUT

{'folder1': {'subfolder1': set(['file1', 'file2', 'file3', 'file4']),
             'subfolder2': set(['file1', 'file2'])},
 'folder2': {'subfolder1': set(['file1', 'file2']),
             'subfolder2': set(['file1', 'file2', 'file3', 'file4'])},
 'folder3': set(['file1', 'file2']),
 'folder4': {'subfolder1': set(['file1', 'file2'])}}

Upvotes: 5

Poojan
Poojan

Reputation: 3519

list_all_files = [['folder1', 'subfolder1', 'file1'], 
                  ['folder1', 'subfolder1', 'file2'],
                  ['folder1', 'subfolder1', 'file3'],
                  ['folder1', 'subfolder1', 'file4'],
                  ['folder1', 'subfolder2', 'file1'],
                  ['folder1', 'subfolder2', 'file2'],
                  ['folder2', 'subfolder1', 'file1'],
                  ['folder2', 'subfolder1', 'file2'],
                  ['folder3', 'file1'],
                  ['folder3', 'file2'],
                  ['folder4', 'subfolder1', 'file1'],
                  ['folder4', 'subfolder1', 'file2'],
                  ['folder2', 'subfolder2', 'file1'],
                  ['folder2', 'subfolder2', 'file2'],
                  ['folder2', 'subfolder2', 'file3'],
                  ['folder2', 'subfolder2', 'file4']]

tree = dict()
def create_tree(l):
    for f in l:
        cur = tree
        # all folders and subfolder till will have dict as key except last subfolder/folder.
        for s in f[:-2]:
            if s not in cur:
                cur[s] = dict()
            cur = cur[s]

        # last folder/subfolder will be list.
        if f[-2] not in cur:
            cur[f[-2]] = set()
        cur = cur[f[-2]]

        # add file to list
        cur.add(f[-1])

create_tree(list_all_files)
tree
  • Output:
{'folder1': {'subfolder1': {'file1', 'file2', 'file3', 'file4'},
             'subfolder2': {'file1', 'file2'}},
 'folder2': {'subfolder1': {'file1', 'file2'},
             'subfolder2': {'file1', 'file2', 'file3', 'file4'}},
 'folder3': {'file1', 'file2'},
 'folder4': {'subfolder1': {'file1', 'file2'}}}

Upvotes: 2

Related Questions