sco1
sco1

Reputation: 12214

Refining fnmatch pattern for more specific results

Brand new to Python, coming from MATLAB. Essentially no UNIX or regexp knowledge.

I have some data for processing sorted into folders. I'd like to get a list of files to process, so I prompt for a top level folder and search everything in that folder and subfolders for a match. Between the documentation for Python and various things here on SO I've gotten most of the way there:

from Tkinter import Tk
import tkFileDialog
import os
import fnmatch

def recursivedecodeprompt():
    root = Tk()
    root.withdraw()
    toplevel = tkFileDialog.askdirectory(title='Select Top Level Directory')

    filelist = []
    for root, dirnames, filenames in os.walk(toplevel):
        for filename in fnmatch.filter(filenames, 'LOG.*'):
            filelist.append(os.path.join(root, filename))

    return filelist

My question is in relation to the pattern string. My folders could have just a LOG.001 file in them, or they could have LOG.001, LOG.001.csv, LOG.001.gps, etc., which my current pattern also matches. I thought I could be clever and use 'LOG.???' but it returns the same list.

Is there a simple way to have fnmatch ignore files with anything appended after the 3 digit ID? Is there a more appropriate tool for the job?

Semi-related side question: Is there a way to allow the tkFileDialog.askdirectory() dialog to be resizable?

EDIT: To clarify, the numeric part of the filename can and will change, so I can have LOG.001, LOG.002, LOG.003, etc. I wish it was a less annoying naming convention but that's how it comes out of the device.

Upvotes: 2

Views: 164

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Using re:

filnames = ["LOG.001","LOG.002","LOG.001.csv","LOG.003.csv","LOG.1002"]
print [x for x in filnames if re.search("LOG.\d+$",x)]

['LOG.001', 'LOG.002', 'LOG.1002']

Upvotes: 3

holdenweb
holdenweb

Reputation: 37043

From what you say it seems that only valid filenames are exactly seven characters long. So the simplest way would seem to be to include

if len(filename) != 7:
    continue

as the first line of the loop. This will terminate the current loop iteration unless the filename is indeed inly seven characters long. No regular expressions required!

Upvotes: 2

Related Questions