Reputation: 7490
I have a piece of code which traverses through the directory using os.walk and then in the corresponding directoy gets the file list of all pdf files there.
For getting the list of pdf files in a particular directory after traversal I use glob like below:
file_list = glob.glob(os.path.join(root,invoice_dir_name, "*.pdf"))
It fetches all files in a directory which end with .pdf.
But I just found a corner case where if the directory has pdf files but if they end in .PDF it returns empty string as it's looking for lower case .pdf extension.
How can I add regular expression in the glob function so it can fetch either of .pdf or .PDF. I tried
file_list = glob.glob(os.path.join(root,invoice_dir_name, "*.(pdf|PDF)"))
but obviously it doesn't work
My code uses glob and os.walk and any other things asked to use would be a redo of code so I was wondering if a soln can be found with glob. Thanks
Upvotes: 0
Views: 1418
Reputation: 2256
How about searching for .pdf & .PDF separately and collecting the info into single list? This way, matches found for both patterns would be combined and returned.
def get_files(root, dir_name, pattern):
patterns = [os.path.join(root, dir_name, pattern.upper()), os.path.join(root, dir_name, pattern.lower())]
return [filename for p in patterns for filename in glob.glob(p)]
If not a new function, then simply replace:
file_list = glob.glob(os.path.join(root,invoice_dir_name, "*.pdf"))
with:
pattern = "*.pdf"
p_lower = os.path.join(root, dir_name, pattern.upper())
p_upper = os.path.join(root, dir_name, pattern.lower())
file_list = [fname for p in (p_lower, p_upper) for fname in glob.glob(p)]
Output:
[
'/Users/username/docs/37-sbc-sleep-apnea-2018.PDF',
'/Users/username/docs/notice.pdf',
'/Users/username/docs/Health2020.pdf',
'/Users/username/docs/West.pdf',
'/Users/username/docs/hello-Health-net-excel-file-2020.pdf',
'/Users/username/docs/2018-arbitration-form-english.pdf'
]
Upvotes: 1