recursively look for files and/or directories

Question

I have a directory tree with csv files, and I want to return files following this pattern (the pattern is from somewhere else, so I will need to stick to that):

"foo"

should match foo/**/*.csv and/or foo.csv, so that

"foo/bar"

matches e.g. foo/bar.csv, foo/bar/baz.csv and foo/bar/baz/qux.csv

So far, I have been iterating through the directory tree twice; first looking for files and then for directories:

from glob import iglob
from itertools import chain
import os

path = "csv_dir"
pattern = "foo/bar"
pattern = os.path.join(*pattern.split("/"))

path_with_pattern = os.path.join(path, pattern)

# first get all csv files in foo/bar and subdirs
files_1 = chain.from_iterable(iglob(os.path.join(root, '*.csv'))
                              for root, dirs, files in os.walk(path_with_pattern))

# then get all foo/bar.csv files
files_2 = chain.from_iterable(iglob(os.path.join(root, pattern + '.csv'))
                              for root, dirs, files in os.walk(path))

for f in chain(files_1, files_2):
    print(f)

This works, but it feels stupid to iterate the tree twice. Is there a clever file matching method I have missed? Or a simple way to filter them out if I start by getting all csv files in the tree?

Jacob Snyder · Accepted Answer

If it is possible for you to use a different library, I suggest using regular expressions as I have found them to be pretty useful when iterating through a directory to find specific file and directory naming patterns.

Here is a little information on regular expressions if they are unfamiliar.

Python Documentation on regex: https://docs.python.org/2/library/re.html

Regex tool testing (works well, though it says it's for Ruby): http://rubular.com/

import os
import re

def searchDirectory(cwd,searchParam,searchResults):
    dirs = os.listdir(cwd)
    for dir in dirs:
        fullpath = os.path.join(cwd,dir)
        if os.path.isdir(fullpath):
            searchDirectory(fullpath,searchParam,searchResults)
        if re.search(searchParam,fullpath):
            searchResults.append(fullpath)

The function will iterate through a directory's contents and make a recursive call if and only if the current item is another directory. Afterwards, it will perform a regular expression search over the path of the current item. It will only access an item in a directory a single time.

I store the paths in a list for simplicity's sake, but you could change what the action performed with these paths is. This can change in the if statement checking for a regular expression match.

        if re.search(searchParam,fullpath):
            searchResults.append(fullpath)

I ran the code below with a small test directory.

searchParam = r'(foo\bar\.*\.txt|foo\.*bar\.txt)'
root = os.getcwd();
searchResults = [];
searchDirectory(root,searchParam,searchResults)
print searchResults

My results after running:

\foo\bar\baz.txt
\foo\bar\biz\qua.txt
\foo\bar.txt
\foo\baz\bar.txt

As a note, I am using Python 2.7 with the Anaconda distribution.

Edit: I used text files to make the directory quick, but if you change the extension in the regular expression it will still work.

recursively look for files and/or directories

Answers (1)

Related Questions