Reputation: 12214
Brand new to Python, coming from MATLAB. Essentially no UNIX or regexp knowledge.
I have some data for processing sorted into folders. I'd like to get a list of files to process, so I prompt for a top level folder and search everything in that folder and subfolders for a match. Between the documentation for Python and various things here on SO I've gotten most of the way there:
from Tkinter import Tk
import tkFileDialog
import os
import fnmatch
def recursivedecodeprompt():
root = Tk()
root.withdraw()
toplevel = tkFileDialog.askdirectory(title='Select Top Level Directory')
filelist = []
for root, dirnames, filenames in os.walk(toplevel):
for filename in fnmatch.filter(filenames, 'LOG.*'):
filelist.append(os.path.join(root, filename))
return filelist
My question is in relation to the pattern string. My folders could have just a LOG.001
file in them, or they could have LOG.001
, LOG.001.csv
, LOG.001.gps
, etc., which my current pattern also matches. I thought I could be clever and use 'LOG.???'
but it returns the same list.
Is there a simple way to have fnmatch
ignore files with anything appended after the 3 digit ID? Is there a more appropriate tool for the job?
Semi-related side question: Is there a way to allow the tkFileDialog.askdirectory()
dialog to be resizable?
EDIT: To clarify, the numeric part of the filename can and will change, so I can have LOG.001
, LOG.002
, LOG.003
, etc. I wish it was a less annoying naming convention but that's how it comes out of the device.
Upvotes: 2
Views: 164
Reputation: 180441
Using re:
filnames = ["LOG.001","LOG.002","LOG.001.csv","LOG.003.csv","LOG.1002"]
print [x for x in filnames if re.search("LOG.\d+$",x)]
['LOG.001', 'LOG.002', 'LOG.1002']
Upvotes: 3
Reputation: 37043
From what you say it seems that only valid filenames are exactly seven characters long. So the simplest way would seem to be to include
if len(filename) != 7:
continue
as the first line of the loop. This will terminate the current loop iteration unless the filename is indeed inly seven characters long. No regular expressions required!
Upvotes: 2