Reputation: 1607
I have a small bit of code to list file names that match a filter string. I'm trying to extend this to match against multiple filters. I have some working code which takes a very straight forward loop approach but it is sloooooooow.... basically running os.walk
for every filter.
Given the function (shown below) is there a way to test against multiple filters at once, rather than 1 at a time? i.e. can I past a list of filter strings to find_files
?
import os
import fnmatch
# stolen from http://stackoverflow.com/questions/8625991/use-python-os-walk-to-identify-a-list-of-files
def find_files(dir_look, filt):
matches = []
for root, dirnames, filenames in os.walk(dir_look):
for filename in fnmatch.filter(filenames, filt):
matches.append(os.path.join(root, filename))
return matches
#create empty list to store results
filelist=[]
#some example filters, my real data has about 5000 filters
filts = [r'*60830007*',r'*60910259*',r'*60910299*']
#find files for each filter entry
for filter in filts:
filelist.append(find_files(r'C:\some directory', filter))
EDIT:
I found a rather obvious way to speed things up by passing the list of filters to the function then testing each inside the os.walk
def find_files(dir_look, filters):
matches = []
for root, dirnames, filenames in os.walk(dir_look):
for filt in filters:
for filename in fnmatch.filter(filenames, filt):
matches.append(os.path.join(root, filename))
return matches
Upvotes: 2
Views: 1403
Reputation: 21
This answer will be about algorithms and data structures, instead of python programming.
If you want to test lot of pattern against a string then you should choose a better structure for representation. Instead of char array we use suffix-trees. (For python implementations see this question.
If some of your filters has common parts (especially if they have the same prefix) you should represent them as trie(s). So this way you can test simultaneously with more than one pattern. This solution creates an overhead of building the tree(s) but if you use the same filters multiple times then it's worth.
Upvotes: 2