numb3rs1x
numb3rs1x

Reputation: 5203

python: how to sort by datetime in a string?

I have this code which I was hoping would work for a list of files in a filesystem. The file names in the directory look like this:

directory/
    ./file-2014-7-8.info
    ./file-2014-7-9.info
    ./file-2014-7-10.info

The relevant code is this:

filetype = '.info'
dir_list = os.listdir(directory)
try:
    latest_file = sorted([i for i in dir_list if i.endswith(filetype)])[-1]
    return latest_file
except Exception as e:
    logging.error("could not find any %s files in the directory: %s" % (filetype, e)

This code returns the 7-9.info file instead of the 7-10.info file.

How do I get it to return the 7-10 without altering the names of the files themselves? Is there an easy way?

Upvotes: 0

Views: 481

Answers (4)

beetea
beetea

Reputation: 308

Build the list of string filenames into a data structure that can be easily sorted. For example, if the date component was treated as ints rather than strs, you'd get what you want. Perhaps something along the lines of:

  [
    ((2014,7,8), './file-2014-7-8.info'),
    ((2014,7,9), './file-2014-7-9.info'),
    ((2014,7,10), './file-2014-7-10.info'),
  ]

There are many ways to get just the date component from the file. Here's one crude way of doing it:

>>> def get_date(f):
...   return map(int, f.replace('./file-', '').replace('.info', '').split('-'))

>>> get_date('./file-2014-7-10.info')
[2014, 7, 10]

Now that you have a function to get the date tuple for each filename, you just have to apply it to the all of them:

>>> import pprint
>>> result = [ (get_date(f), f) for f in contents ]
>>> pprint.pprint(result)
[([2014, 7, 8], './file-2014-7-8.info'),
 ([2014, 7, 9], './file-2014-7-9.info'),
 ([2014, 7, 10], './file-2014-7-10.info')]

If you call sorted on the result with default options, it'll output the list in date-ascending order and you can just grab the last item.

Upvotes: 0

numb3rs1x
numb3rs1x

Reputation: 5203

This was answered from the ideas given in the comments section of the original question. The credit goes to cox who suggested I look in the pypi repo for natsort. Here is the code changed to work properly:

from natsort import natsorted
filetype = '.info'
dir_list = os.listdir(directory)
try:
    latest_file = natsorted([i for i in dir_list if i.endswith(filetype)])[0]
    return latest_file
except Exception as e:
    logging.error("could not find any %s files in the directory: %s" % (filetype, e)

Upvotes: 0

CCKx
CCKx

Reputation: 1343

You could use a lambda function to parse out the datetime part of the file names while sorting.

import datetime

filetype = '.info'
dir_list = [i for i os.listdir(directory) if i.endswith(filetype)]
try:
    sorted_files = sorted(dir_list, key=lambda x: datetime.datetime.strptime(x[5:-5], "%Y-%m-%d"))
    return sorted_files[-1]
except Exception as e:
    logging.error("could not find any %s files in the directory: %s" % (filetype, e)

Upvotes: 0

Joran Beasley
Joran Beasley

Reputation: 113940

fname_2_ts = lambda fname:time.strptime(os.path.basename(fname),"file-%Y-%m-%d.info")
latest_file = sorted([i for i in dir_list if i.endswith(filetype)],key = fname_2_ts)[-1]

the problem was that you were comparing as strings and "1" (the first part of "10" is less than both "8" and "9")

Upvotes: 1

Related Questions