AdamMc331
AdamMc331

Reputation: 16691

Filter for file extensions using os.walk but with restrictions on file name

I am using os.walk to iterate through a directory and want to count the number of lines I have in .java files of that directory. I've found from other answers that I can use fnmatch.filter to only get .java files like this:

for (root, dirs, files) in os.walk(project_directory):
        for file in fnmatch.filter(files, '*.java'):
            # get line count

However, I want to exclude a few files that have a specific name, say MyExclusion.java. How can I enhance the filter the avoid searching these files? The best I can figure out is to add another conditional:

for (root, dirs, files) in os.walk(project_directory):
        for file in fnmatch.filter(files, '*.java'):
            if file != 'MyExclusion.java':
                # get line count

Can fnmatch.filter be used to do this, or am I forced to add a conditional check here?

Upvotes: 1

Views: 304

Answers (1)

metatoaster
metatoaster

Reputation: 18908

You could have passed the filter to another filter function or simply construct a list-comprehension.

>>> files = ['manifest.xml', 'Test.java', 'Foo.java', 'MyExclusion.java']
>>> [f for f in fnmatch.filter(files, '*.java')
...     if f not in ('MyExclusion.java', 'Bad.java')]
['Test.java', 'Foo.java']

Alternative method using regex: compile a pattern and replace that into the if condition

>>> import re
>>> patt = re.compile('^(MyExclusion|Bad)')
>>> [i for i in fnmatch.filter(files, '*.java') if not patt.search(i)]
['Test.java', 'Foo.java']

Consider using generator expression instead of a list comprehension, so that you would do something like:

    for file in (i for i in fnmatch.filter(files, '*.java') if not patt.search(i)):
        # get line count

To eliminate generating the second list all in one go to potentially reduce memory consumption.

Upvotes: 1

Related Questions