Eric
Eric

Reputation: 295

returning latest file in directory for specific format

I have a directory with files of the format:

test_report-01-13-2014.11_53-en.zip
test_report-12-04-2013.11_53-en.zip

and I need to return the last files based on the date in the file names not the date the file was last touched. If I do that I could end up with the 2013 file instead, which would be wrong . I am doing the following, but it's not working. I am passing in the following paramaters:

mypath = "C:\\temp\\test\\"
mypattern = "test_report-%m-%d-%Y*"
myfile = getLatestFile(mypath, mypattern)

def getLatestFile(path="./", pattern="*"):
   fformat= path + pattern
   archives = glob.glob(fformat)

   if len(archives) > 0:
       return archives[-1]
   else:
       return None

any idea what could be the cause of the problem?

Upvotes: 1

Views: 1711

Answers (5)

some_weired_user
some_weired_user

Reputation: 586

If you would like to sort your list by name, just do sorted(archives = glob.glob(fformat))

Upvotes: 0

Burhan Khalid
Burhan Khalid

Reputation: 174624

glob returns matching paths in an arbitrary order, and it doesn't understand %m-%d-%Y (its not that smart).

You need to read the list of paths, extract the file name, then get the date from the file name. This will be the key that you will use to sort the list of files.

Here is one way to do just that:

import glob
import os
import datetime

def sorter(path):
    filename = os.path.basename(path)
    return datetime.datetime.strptime(filename[12:22], '%m-%d-%Y')

pattern = "test_report-*"
search_path = r'C:\temp\test\' # or 'c:/temp/test/'

file_list = glob.glob(pattern+search_path)

# Order by the date
ordered_list = sorted(file_list, key=sorter, reverse=True)

os.path.basename is a function to return the last component of a path; since glob will return the full path, the last component will be the file name.

As your file name has a fixed format - instead of mucking with regular expressions I just grabbed the date part by slicing the file name, and converted it to a datetime object.

Finally, sorted returns the result of the sort (the normal sort method is an in place sort). The key function is what extract the date and returns it, reverse=True is required to get the returned list in the order of latest first.

You can shorten the code a bit by passing the result of glob.glob directly to sorted:

ordered_list = sorted(glob.glob(pattern+search_path), key=sorter, reverse=True)

To combine this with the function you have written:

import glob, os, datetime

def sorter(path):
    filename = os.path.basename(path)
    return datetime.datetime.strptime(filename[12:22], '%m-%d-%Y')

def getLatestFile(path="./", pattern="*"):
   fformat = path + pattern
   archives = glob.glob(fformat)

   if len(archives):
      return sorted(archives, key=sorter, reverse=True)[0]

Upvotes: 2

mhlester
mhlester

Reputation: 23221

The order of archives is arbitrary, but not only that your filenames can't be sorted alphabetically (month comes before year). Easiest way is to sort your list with a key function that extracts a datetime object from the filename:

import datetime

def getDateFromFilename(filename):
    try:
        return datetime.datetime.strptime(timestamp[12:-7], '%m-%d-%Y.%H_%M')
    except ValueError:
        return -1

archives.sort(key=getDateFromFilename)

Upvotes: 1

Eric
Eric

Reputation: 295

Thanks a lot for the input. I used a little bit of everything and ended up with this, which works fine for my purposes.

def getDateFromFilename(filename):
    try:
        return datetime.datetime.strptime(filename, myPattern + '%m-%d-%Y.%H_%M-en.zip')
    except ValueError:
        return -1

def getLatestFile(path, pattern):
    files = sorted([f for f in os.listdir(myPath) if f.startswith(pattern)])
    files.sort(key=getDateFromFilename)

    if len(files) > 0:
        return files[-1]
    else:
        return  None

Upvotes: 0

user1907906
user1907906

Reputation:

See the Python documentation:

os.listdir(path='.')

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

So you must either use a stricter filter or order the returned list.

Upvotes: 0

Related Questions