Reputation: 9
I have a list of filenames sorted by creation date. These files contain a datetime in the filename for their creation date time. I am attempting to create a sub list for all files after a certain time.
Full list of files -
Allfilenames = ['CCN-200 data 130321055347.csv',
'CCN-200 data 130321060000.csv',
'CCN-200 data 130321063235.csv',
'CCN-200 data 130321070000.csv',
'CCN-200 data 130321080000.csv',
'CCN-200 data 130321090000.csv',
'CCN-200 data 130321100000.csv',
'CCN-200 data 130321110000.csv',
'CCN-200 data 130321120000.csv',
'CCN-200 data 130321130000.csv',
'CCN-200 data 130321140000.csv',
'CCN-200 data 130321150000.csv']
positions [19:24]
give the time in format hhmmss. I am using
filenames = [s for s in Allfilenames if os.path.basename(s)[19:24] >= TOffRound]
TOffRound = "080000"
The result should be a list of all filenames created on or after or 08:00:00, however the resulting list is missing the "080000" file.
filenames = ['CCN-200 data 130321090000.csv',
'CCN-200 data 130321100000.csv',
'CCN-200 data 130321110000.csv',
'CCN-200 data 130321120000.csv',
'CCN-200 data 130321130000.csv',
'CCN-200 data 130321140000.csv',
'CCN-200 data 130321150000.csv']
Why is the conditional not returning true on the = part of the condition and returning 'CCN-200 data 130321080000.csv' in my list? Please note I have only shown the basename here for clarity.
Upvotes: 0
Views: 75
Reputation: 4260
The problem with the code given, as suggested by others, is that you are missing the last digit. In terms of slicing a list, the "stop" number given after the : is not considered.
(eg):
>> a = "hello world"
>> print a[0:4]
hell
>> print a[0:5]
hello
So, change this line in your code and you are good to go:
filenames = [s for s in Allfilenames if os.path.basename(s)[19:25] >= TOffRound]
However, what you are doing does not scale at all. This is not easier to maintain nor work with any file that is a even a slightly different. The code can be transformed like this:
def filter_files(file_list, TOffRound):
text_length = len(TOffRound)
return [file_name for file_name in file_list if file_name[-text_length:] >= TOffRound]
This will work, irrespective of the size of the file name.
I would also suggest you to get the list of files based on their modification time, that can be taken using os.stat
or os.path.getmtime
, and act accordingly, rather than using the file name. File name is a string and even though it can support you with older or newer files, it is generally, not a good idea to use that way. You are converting a time stamp to string for the file name. Then this string is converted back to time stamp and convert in the normal case. Instead, if you go for file modification time, you can stay only with the date and time formats rather than the conversions that need be done. This has few advantages:
Upvotes: 0
Reputation: 5902
Instead of checking the time part as a string, I would suggest a stronger method to test the time part of your filename. This includes extracting the date part of the filename, retrieving the time value and comparing it on your specified time as a time object.
import re
import datetime
TOffRound = datetime.time(8, 0)
filenames = []
for s in Allfilenames:
datestr = re.search("[\d]{12}", s).group(0)
dateobj = datetime.datetime.strptime(datestr,"%y%m%d%H%M%S")
timeobj = dateobj.time()
if timeobj >= TOffRound:
filenames.append(s)
Upvotes: 1
Reputation: 64
In your filenames hhmmss
exist from index 19:25
rather than 19:24
. So the correct statement to get the hhmmss
from filename is:
filenames = [s for s in Allfilenames if os.path.basename(s)[19:25] >= TOffRound]
Upvotes: 0