Reputation: 295
I have a directory with files of the format:
test_report-01-13-2014.11_53-en.zip
test_report-12-04-2013.11_53-en.zip
and I need to return the last files based on the date in the file names not the date the file was last touched. If I do that I could end up with the 2013 file instead, which would be wrong . I am doing the following, but it's not working. I am passing in the following paramaters:
mypath = "C:\\temp\\test\\"
mypattern = "test_report-%m-%d-%Y*"
myfile = getLatestFile(mypath, mypattern)
def getLatestFile(path="./", pattern="*"):
fformat= path + pattern
archives = glob.glob(fformat)
if len(archives) > 0:
return archives[-1]
else:
return None
any idea what could be the cause of the problem?
Upvotes: 1
Views: 1711
Reputation: 586
If you would like to sort your list by name, just do sorted(archives = glob.glob(fformat))
Upvotes: 0
Reputation: 174624
glob
returns matching paths in an arbitrary order, and it doesn't understand %m-%d-%Y
(its not that smart).
You need to read the list of paths, extract the file name, then get the date from the file name. This will be the key that you will use to sort the list of files.
Here is one way to do just that:
import glob
import os
import datetime
def sorter(path):
filename = os.path.basename(path)
return datetime.datetime.strptime(filename[12:22], '%m-%d-%Y')
pattern = "test_report-*"
search_path = r'C:\temp\test\' # or 'c:/temp/test/'
file_list = glob.glob(pattern+search_path)
# Order by the date
ordered_list = sorted(file_list, key=sorter, reverse=True)
os.path.basename
is a function to return the last component of a path; since glob
will return the full path, the last component will be the file name.
As your file name has a fixed format - instead of mucking with regular expressions I just grabbed the date part by slicing the file name, and converted it to a datetime object.
Finally, sorted
returns the result of the sort (the normal sort
method is an in place sort). The key function is what extract the date and returns it, reverse=True
is required to get the returned list in the order of latest first.
You can shorten the code a bit by passing the result of glob.glob
directly to sorted:
ordered_list = sorted(glob.glob(pattern+search_path), key=sorter, reverse=True)
To combine this with the function you have written:
import glob, os, datetime
def sorter(path):
filename = os.path.basename(path)
return datetime.datetime.strptime(filename[12:22], '%m-%d-%Y')
def getLatestFile(path="./", pattern="*"):
fformat = path + pattern
archives = glob.glob(fformat)
if len(archives):
return sorted(archives, key=sorter, reverse=True)[0]
Upvotes: 2
Reputation: 23221
The order of archives is arbitrary, but not only that your filenames can't be sorted alphabetically (month comes before year). Easiest way is to sort
your list with a key
function that extracts a datetime
object from the filename:
import datetime
def getDateFromFilename(filename):
try:
return datetime.datetime.strptime(timestamp[12:-7], '%m-%d-%Y.%H_%M')
except ValueError:
return -1
archives.sort(key=getDateFromFilename)
Upvotes: 1
Reputation: 295
Thanks a lot for the input. I used a little bit of everything and ended up with this, which works fine for my purposes.
def getDateFromFilename(filename):
try:
return datetime.datetime.strptime(filename, myPattern + '%m-%d-%Y.%H_%M-en.zip')
except ValueError:
return -1
def getLatestFile(path, pattern):
files = sorted([f for f in os.listdir(myPath) if f.startswith(pattern)])
files.sort(key=getDateFromFilename)
if len(files) > 0:
return files[-1]
else:
return None
Upvotes: 0
Reputation:
See the Python documentation:
os.listdir(path='.')
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.
So you must either use a stricter filter or order the returned list.
Upvotes: 0