fdsgds
fdsgds

Reputation: 137

Find parent folders in list of paths

I have a list of folders like this:

u'Magazines/testfolder1',
u'Magazines/testfolder1/folder1/folder2/folder3',
u'Magazines/testfolder1/folder1/',
u'Magazines/testfolder1/folder1/folder2/',
u'Magazines/testfolder2',
u'Magazines/testfolder2/folder1/folder2/folder3',
u'Magazines/testfolder2/folder1/',
u'Magazines/testfolder2/folder1/folder2/',
u'Magazines/testfolder3',
u'Magazines/testfolder3/folder1/folder2/folder3',
u'Magazines/testfolder3/folder1/',
u'Magazines/testfolder3/folder1/folder2/',

Now what I want is the list of only parent folders.

i.e in the example above I want that to reduce to:

u'Magazines/testfolder1',
u'Magazines/testfolder2',
u'Magazines/testfolder3',

because they all contain child folders.

I am recursively adding folders in My database so if I have testfolder1 then script will automatically recurse its subfolders. So I don't need sub folders in the list if their parent is also in the list.

How can I do that?

Upvotes: 0

Views: 204

Answers (4)

Marcin
Marcin

Reputation: 49826

l =[u'Magazines/testfolder1',
    u'Magazines/testfolder1/folder1/folder2/folder3',
    u'Magazines/testfolder1/folder1/',
    u'Magazines/testfolder1/folder1/folder2/',
    u'Magazines/testfolder2',
    u'Magazines/testfolder2/folder1/folder2/folder3',
    u'Magazines/testfolder2/folder1/',
    u'Magazines/testfolder2/folder1/folder2/',
    u'Magazines/testfolder3',
    u'Magazines/testfolder3/folder1/folder2/folder3',
    u'Magazines/testfolder3/folder1/',
    u'Magazines/testfolder3/folder1/folder2/', ]

mincount = min(s.count('/') for s in l)
[d for d in sorted(l) if d.count('/') <= mincount]
#=> [u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3']

It's not excessively clever, but it works where there's a common root.

Upvotes: 0

falsetru
falsetru

Reputation: 369054

Use set:

>>> list_of_folders = [
...     u'Magazines/testfolder1',
...     u'Magazines/testfolder1/folder1/folder2/folder3',
...     u'Magazines/testfolder1/folder1/',
...     u'Magazines/testfolder1/folder1/folder2/',
...     u'Magazines/testfolder2',
...     u'Magazines/testfolder2/folder1/folder2/folder3',
...     u'Magazines/testfolder2/folder1/',
...     u'Magazines/testfolder2/folder1/folder2/',
...     u'Magazines/testfolder3',
...     u'Magazines/testfolder3/folder1/folder2/folder3',
...     u'Magazines/testfolder3/folder1/',
...     u'Magazines/testfolder3/folder1/folder2/',
... ]
>>> result = set()
>>> for folder in list_of_folders:
...     for parent in result:
...         if folder.startswith(parent):
...             break
...     else:
...         result.add(folder)
... 
>>> result
{'Magazines/testfolder3', 'Magazines/testfolder2', 'Magazines/testfolder1'}

UPDATE

list_of_folders = [
    ...
]
result = set()
for folder in list_of_folders:
    if all(not folder.startswith(parent) for parent in result):
        result.add(folder)
print result

Upvotes: 2

misguided
misguided

Reputation: 3789

Mate I beleive below is the solution you are looking for

lst = [
u'Magazines/testfolder1',
u'Magazines/testfolder1/folder1/folder2/folder3',
u'Magazines/testfolder1/folder1/',
u'Magazines/testfolder1/folder1/folder2/',
u'Magazines/testfolder2',
u'Magazines/testfolder2/folder1/folder2/folder3',
u'Magazines/testfolder2/folder1/',
u'Magazines/testfolder2/folder1/folder2/',
u'Magazines/testfolder3',
u'Magazines/testfolder3/folder1/folder2/folder3',
u'Magazines/testfolder3/folder1/',
u'Magazines/testfolder3/folder1/folder2/'
 ]

    for x in lst:
       for y in lst[:]: 
           if x in y and len(x)<len(y):
               lst.remove(y)
    print lst

Output

[u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3']

This program iteratively removes the subfolders from your list , leaving behind only the parent folder.

Upvotes: 0

kanghyojmun
kanghyojmun

Reputation: 350

how about use regular expression.

import re

l = [
    u'Magazines/testfolder1',
    u'Magazines/testfolder1/folder1/folder2/folder3',
    u'Magazines/testfolder1/folder1/',
    u'Magazines/testfolder1/folder1/folder2/',
    u'Magazines/testfolder2',
    u'Magazines/testfolder2/folder1/folder2/folder3',
    u'Magazines/testfolder2/folder1/',
    u'Magazines/testfolder2/folder1/folder2/',
    u'Magazines/testfolder3',
    u'Magazines/testfolder3/folder1/folder2/folder3',
    u'Magazines/testfolder3/folder1/',
    u'Magazines/testfolder3/folder1/folder2/',
]

expect = [
    u'Magazines/testfolder1',
    u'Magazines/testfolder2',
    u'Magazines/testfolder3', 
]

result = filter(lambda x: re.match('^[^\/]+\/[^\/]+$', x), l)

assert expect == result

Upvotes: 0

Related Questions