Rohit
Rohit

Reputation: 868

Get the latest files from Location using Python

I have a script which gets all the files from particular location.But I need to fetch the files which are lates. The script should give the latest files which are present at that location.

eg.I have a location at whcih there are some files named as below

DataLogs_20141125_AP.CSV   
DataLogs_20141125_UK_EARLY.CSV  
DataLogs_20141125_CAN.CSV  
DataLogs_20141125_US.CSV 
DataLogs_20141125_EUR.CSV  
DataLogs_20141125_US_2.CSV 
DataLogs_20141126_AP.CSV   
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV  
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV  
DataLogs_20141126_US_2.CSV

I want to fetch the files which are the latest. eg.the files matching "20141126" pattern are the latest ones.

I tried with match but it gives me all the files.

filematch ='DataLogs_2014_*.CSV'

Upvotes: 0

Views: 66

Answers (3)

monkut
monkut

Reputation: 43860

You could also use itertools.groupby to group files by the date in the filename.

from itertools import groupby

file_list = ['DataLogs_20141125_AP.CSV', 'DataLogs_20141125_UK_EARLY.CSV',  'DataLogs_20141125_CAN.CSV',  'DataLogs_20141125_US.CSV', 'DataLogs_20141125_EUR.CSV',  'DataLogs_20141125_US_2.CSV', 'DataLogs_20141126_AP.CSV',
    'DataLogs_20141126_UK_EARLY.CSV','DataLogs_20141126_CAN.CSV',  'DataLogs_20141126_US.CSV','DataLogs_20141126_EUR.CSV',  'DataLogs_20141126_US_2.CSV']

def group_key_func(value):
    """Function to pull out and return the key value to group by in the filename"""
    return value.split("_")[1]  # pulls out '20141126' in 'DataLogs_20141126_CAN.CSV'

newest_date, newest_files = sorted([(group_key, list(group)) for group_key, group in groupby(file_list, key=group_key_func)], reverse=True)[0]

Newest date, files result:

20141126: 
DataLogs_20141126_AP.CSV
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV
DataLogs_20141126_US_2.CSV

Upvotes: 0

Marcin
Marcin

Reputation: 238517

You can do as follows:

data = """DataLogs_20141125_AP.CSV   
DataLogs_20141125_UK_EARLY.CSV  
DataLogs_20141125_CAN.CSV  
DataLogs_20141125_US.CSV 
DataLogs_20141125_EUR.CSV  
DataLogs_20141125_US_2.CSV 
DataLogs_20141126_AP.CSV   
DataLogs_20141126_UK_EARLY.CSV
DataLogs_20141126_CAN.CSV  
DataLogs_20141126_US.CSV
DataLogs_20141126_EUR.CSV  
DataLogs_20141126_US_2.CSV"""


print(list(fname for fname in data.split() if '20141126' in fname))

Gives:

['DataLogs_20141126_AP.CSV', 'DataLogs_20141126_UK_EARLY.CSV', 'DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV', 'DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']

For more general solution, i.e. the one that searchers for the latest date, you can do as @user3 recommends.

Upvotes: 0

venpa
venpa

Reputation: 4318

You could do this:

  1. Get the latest date by splitting individual file names and taking the first element from reverse sorted.
  2. From the latest date, get all the files which contain latest date

    fileList = ['DataLogs_20141125_AP.CSV', 'DataLogs_20141125_UK_EARLY.CSV',  'DataLogs_20141125_CAN.CSV',  'DataLogs_20141125_US.CSV', 'DataLogs_20141125_EUR.CSV',  'DataLogs_20141125_US_2.CSV', 'DataLogs_20141126_AP.CSV',
        'DataLogs_20141126_UK_EARLY.CSV','DataLogs_20141126_CAN.CSV',  'DataLogs_20141126_US.CSV','DataLogs_20141126_EUR.CSV',  'DataLogs_20141126_US_2.CSV']
    latest = sorted(map(lambda x:x.split('_')[1],fileList), reverse=True)[1]
    print filter(lambda x:x.find(latest)!=-1, fileList)
    

Output:

['DataLogs_20141126_AP.CSV', 'DataLogs_20141126_UK_EARLY.CSV', 'DataLogs_20141126_CAN.CSV', 'DataLogs_20141126_US.CSV', 'DataLogs_20141126_EUR.CSV', 'DataLogs_20141126_US_2.CSV']

Upvotes: 2

Related Questions