Philalethes
Philalethes

Reputation: 115

Prevent os.walk from stopping after finding one subdirectory without file type that I am filtering for

I am trying to walk through the subdirectories of a parent directory looking for the .xlsx file with the newest date in the file name in each subdirectory. The naming convention for my files will be such that they will start with the date and then filename.

ex. 20180621 file name.xlsx

This way I can find the newest file from each subdirectory and run my script on them.

I have the following code which only works if I have a .xlsx in every directory, including the parent directory. If I do not have a .xlsx in any of the directories, the code returns ValueError: max() arg is an empty sequence and then it exits without continuing the search.

Parent Directory
----subdirectory1
--------subdirectory1.1
----subdirectory2
----subdirectory3
----etc.

How can I get os.walk to continue searching through all the subdirectories even after it finds one that does not contain the .xlsx file that I am looking for (including if the parent directory doesn't have a .xlsx file).

for root, dirs, files in os.walk(path):
    list_of_files = []
    for file in files:
        if file.endswith(".xlsx"):
            list_of_files.append(file)
    largest = max(list_of_files)
    print (largest)

Upvotes: 0

Views: 371

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1121972

os.walk() can't continue because an exception was raised. Either don't call max() with an empty list, catch the exception, or tell max() to return a default value if the list is empty.

You can trivially skip testing for the largest if there are no excel files; if list_of_files: will be false if the list is empty:

for root, dirs, files in os.walk(path):
    list_of_files = []
    for file in files:
        if file.endswith(".xlsx"):
            list_of_files.append(file)
    largest = None
    if list_of_files:
        largest = max(list_of_files)
    print(largest or 'No Excel files in this directory')

If you are using Python 3.4 or newer, you can also tell the max() function to return a default value if your input list is empty:

for root, dirs, files in os.walk(path):
    list_of_files = []
    for file in files:
        if file.endswith(".xlsx"):
            list_of_files.append(file)
    largest = max(list_of_files, None)  # None is the default value
    print(largest or 'No Excel files in this directory')

Last but not least, you can use try...except ValueError: to handle the exception thrown:

for root, dirs, files in os.walk(path):
    list_of_files = []
    for file in files:
        if file.endswith(".xlsx"):
            list_of_files.append(file)
    try:
        largest = max(list_of_files)
        print(largest)
    except ValueError:
        print('No Excel files in this directory')

You can simplify your code by using the fnmatch.filter() function to filter out matching files:

import fnmatch
import os

for root, dirs, files in os.walk(path):
    excel_files = fnmatch.filter(files, '*.xlsx')
    largest = max(list_of_files, None)

Upvotes: 4

Adam Smith
Adam Smith

Reputation: 54213

It doesn't stop, max throws an error. You can handle this in a couple of ways:

...
for file in files:
    if file.endswith(".xlsx"):
        list_of_files.append(file)
if list_of_files:  # if it's not blank...
    print(max(list_of_files))

or

...
for file in files:
    if file.endswith(".xlsx"):
        list_of_files.append(file)
try:
    print(max(list_of_files))
except ValueError:  # something goes wrong with `max` (or `print` I guess)
    # what do we do? Probably...
    pass

Upvotes: 1

Related Questions