Reputation: 868
I have a script which gets all the files from particular location.But I need to fetch the files which are lates. The script should give the latest files which are present at that location.
eg.I have a location at whcih there are some files named as below
DataLogs_20141125_AP.CSV
DataLogs_20141125_UK_EARLY.CSV
DataLogs_20141125_CAN.CSV
DataLogs_20141125_US.CSV
DataLogs_20141125_EUR.CSV
DataLogs_20141125_US_2.CSV
DataLogs_20141126_AP.CSV
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV
DataLogs_20141126_US_2.CSV
I want to fetch the files which are the latest. eg.the files matching "20141126" pattern are the latest ones.
I tried with match but it gives me all the files.
filematch ='DataLogs_2014_*.CSV'
Upvotes: 0
Views: 66
Reputation: 43860
You could also use itertools.groupby
to group files by the date in the filename.
from itertools import groupby
file_list = ['DataLogs_20141125_AP.CSV', 'DataLogs_20141125_UK_EARLY.CSV', 'DataLogs_20141125_CAN.CSV', 'DataLogs_20141125_US.CSV', 'DataLogs_20141125_EUR.CSV', 'DataLogs_20141125_US_2.CSV', 'DataLogs_20141126_AP.CSV',
'DataLogs_20141126_UK_EARLY.CSV','DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV','DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']
def group_key_func(value):
"""Function to pull out and return the key value to group by in the filename"""
return value.split("_")[1] # pulls out '20141126' in 'DataLogs_20141126_CAN.CSV'
newest_date, newest_files = sorted([(group_key, list(group)) for group_key, group in groupby(file_list, key=group_key_func)], reverse=True)[0]
Newest date, files result:
20141126:
DataLogs_20141126_AP.CSV
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV
DataLogs_20141126_US_2.CSV
Upvotes: 0
Reputation: 238517
You can do as follows:
data = """DataLogs_20141125_AP.CSV
DataLogs_20141125_UK_EARLY.CSV
DataLogs_20141125_CAN.CSV
DataLogs_20141125_US.CSV
DataLogs_20141125_EUR.CSV
DataLogs_20141125_US_2.CSV
DataLogs_20141126_AP.CSV
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV
DataLogs_20141126_US_2.CSV"""
print(list(fname for fname in data.split() if '20141126' in fname))
Gives:
['DataLogs_20141126_AP.CSV', 'DataLogs_20141126_UK_EARLY.CSV', 'DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV', 'DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']
For more general solution, i.e. the one that searchers for the latest date, you can do as @user3 recommends.
Upvotes: 0
Reputation: 4318
You could do this:
From the latest date, get all the files which contain latest date
fileList = ['DataLogs_20141125_AP.CSV', 'DataLogs_20141125_UK_EARLY.CSV', 'DataLogs_20141125_CAN.CSV', 'DataLogs_20141125_US.CSV', 'DataLogs_20141125_EUR.CSV', 'DataLogs_20141125_US_2.CSV', 'DataLogs_20141126_AP.CSV',
'DataLogs_20141126_UK_EARLY.CSV','DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV','DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']
latest = sorted(map(lambda x:x.split('_')[1],fileList), reverse=True)[1]
print filter(lambda x:x.find(latest)!=-1, fileList)
Output:
['DataLogs_20141126_AP.CSV', 'DataLogs_20141126_UK_EARLY.CSV', 'DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV', 'DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']
Upvotes: 2