Troy Ward
Troy Ward

Reputation: 87

Consolidate list of file paths

I have a list of directory paths that I am trying to reduce to the smallest list possible. For example:

bionic/main/installer-amd64/20101020ubuntu543/images/hd-media
bionic/main/installer-amd64/20101020ubuntu543/images/cdrom/xen
bionic/main/installer-amd64/20101020ubuntu543/images/cdrom
bionic/main/installer-amd64/20101020ubuntu543/images
bionic/main/installer-amd64/20101020ubuntu543

I can get rid of bionic/main/installer-amd64/20101020ubuntu543 because it can collapse into bionic/main/installer-amd64/20101020ubuntu543/images which I can also get rid of because it can collapse into bionic/main/installer-amd64/20101020ubuntu543/images/cdrom which then collapses into bionic/main/installer-amd64/20101020ubuntu543/images/cdrom/xen

I know there is a way to do this, but I can't figure it out for the life of me.

Upvotes: 2

Views: 78

Answers (2)

Wups
Wups

Reputation: 2569

There is a simpler solution: If you sort the list, it is enough to compare every dir with the next one in the list:

dirs = sorted(dirs)

result = []
for i, d in enumerate(dirs):
    if i == len(dirs) - 1 or not dirs[i+1].startswith(d): # unique
        result.append(d)

for d in result:
    print(d)

or, as a list comprehension:

result = [
    d for i, d in enumerate(sorted(dirs))
    if i == len(dirs) - 1 or not dirs[i+1].startswith(d)
]

Upvotes: 1

tobias_k
tobias_k

Reputation: 82929

If I understand your question correctly, you could put the directories into a tree structure and then get the leaf nodes from that tree, which are the directories that are not a parent to another directory in the list.

dirs = ["bionic/main/installer-amd64/20101020ubuntu543/images/hd-media",
        "bionic/main/installer-amd64/20101020ubuntu543/images/cdrom/xen",
        "bionic/main/installer-amd64/20101020ubuntu543/images/cdrom",
        "bionic/main/installer-amd64/20101020ubuntu543/images",
        "bionic/main/installer-amd64/20101020ubuntu543"]

tree = {}
for d in dirs:
    t = tree
    for x in d.split("/"):
        t = t.setdefault(x, {})
# {'bionic': {'main': {'installer-amd64': {'20101020ubuntu543': {'images': {'hd-media': {}, 'cdrom': {'xen': {}}}}}}}}

def leafs(tree):
    for x in tree:
        if tree[x]:
            for l in leafs(tree[x]):
                yield x + "/" + l
        else:
            yield x

for l in leafs(tree):
    print(l)
# bionic/main/installer-amd64/20101020ubuntu543/images/hd-media
# bionic/main/installer-amd64/20101020ubuntu543/images/cdrom/xen

Upvotes: 1

Related Questions