empty
empty

Reputation: 5434

How can I get folder names that match a substring?

I need to recursively find all paths below a folder with names that contain the substring "Bar". In Python 2.7

That is for the folder structure

Foo
|
------ Doug
|        |
|        --------CandyBar
|
---------MilkBar

I need to get the list ["Foo/Doug/CandyBar", "Foo/MilkBar"]

Now I can use os.walk and glob.glob and write a bunch of loops to get this list but I'm wondering if I'm missing a simpler technique.

Upvotes: 1

Views: 1252

Answers (2)

whackamadoodle3000
whackamadoodle3000

Reputation: 6748

Try this:

import os
[x for x, _, _ in os.walk("path") if "bar" in x and os.path.isdir(x)]

Upvotes: 2

alp
alp

Reputation: 389

Maybe it is a good choice to use a generator

import os
res = (path for path,_,_ in os.walk("path") if "bar" in path)

NOTE: I use "/" as root path because my system is unix-like. If you are on windows substitute "/" with "C:\" (or whatever you want)

PROS:

  • generators use far less memory and does not "blocks" the system while computing.

example:

# returns immediately
res = (path for path,_,_ in os.walk("/") if "bar" in path)

#I have to wait (who knows how much time)
res = [path for path,_,_ in os.walk("/") if "bar" in path]
  • You can get one path at time with waiting only the time needed to find the next "path"

example:

res = (path for path,_,_ in os.walk("/") if "bar" in path)
# the for starts at no time
for path in res:
    # at each loop I only wait the time needed to compute the next path
    print(path) # see the path printed as it is computed 

res = [path for path,_,_ in os.walk("/") if "bar" in path]
# the for starts only after all paths are computed
for path in res:
    # no wait for each loop.
    print(path) # all paths printed at once 
  • if you want to keep the "path" found a part you CAN STORE it in a list and have only the "path" you are interested in (less memory usage)

example:

res = (path for path,_,_ in os.walk("/") if "bar" in path)
path_store = []
for path in res:
    # I'm only interested in paths having odd length
    # at the end of the loop I will use only needed memory
    if(len(path)%2==1):
        path_store.append(path)
  • if at some point you are done and you are not interested in looking for more "paths" you can stop at any moment saving the time needed for all paths not computed

example:

res = (path for path,_,_ in os.walk("/") if "bar" in path)
path_store = []
count = 10
for path in res:
    # I'm only interested in paths having odd length
    if(len(path)%2==1):
        count -= 1
        path_store.append(path)
        # I'm only interested in the first 10 paths.
        # Using generator I waited only for the computation of those 10 paths.
        # Using list you will always wait for the computation for all paths
        if( count <= 0 ):
            break

CONS:

  • You can't use indexes with generators. You can only get the next item.

  • if you want a list with all paths at once, you have to convert it in a list (so it is better to use a list comprehension)

  • generators are one-shot forward (you can't go back after getting the next element)

  • if you want to keep some "path" you HAVE TO store it somewhere (like a list), otherwise it will be lost

in the code path is lost at each iteration. At the end of the loop res is exhausted and is no more usable. I have to store the path I'm interested in in the list path_store.

path_store = []
for path in res:
    # I'm only interested in paths having odd length
    if(len(path)%2==1):
        path_store.append(path)
path = next(res) # Error StopIteration

Upvotes: 3

Related Questions