Luis Medina
Luis Medina

Reputation: 555

Python get latest files from list

I have a list of files like this:

my_list=['l.txt','PPT_6_202008062343HLC.txt','PPT_6_202008070522HLC.txt','PPT_12_202008062343HLC.txt','PPT_12_202008070522HLC.txt']

and I want to have a final list with the latest that files that begins with ppt_6 and ppt_12 and keep the other elements items, like this:

final_list=
['PPT_6_202008070522HLC.txt', 'PPT_12_202008070522HLC.txt', 'l.txt']

right now I'm doing this:

from datetime import datetime
now = datetime.now()

new_arc=[]
time_6=[]
time_12=[]
for i in my_list:
    if i[4:5]=='6':
        time_6.append(i)
    elif i[4:5]=='1':
        time_12.append(i)
    else:
        new_arc.append(i)

time_6 = [max(t for t in time_6 if datetime.strptime(t[-15:-3], '%Y%m%d%H%M') < now)]
time_12 = [max(t for t in time_12 if datetime.strptime(t[-15:-3], '%Y%m%d%H%M') < now)]

final_list=time_6+time_12+new_arc

is there a better way of doing this ?

Upvotes: 0

Views: 102

Answers (3)

Algebra8
Algebra8

Reputation: 1355

The best I could come up with was this:

import re

my_list = [
    'l.txt','PPT_6_202008062343HLC.txt','PPT_6_202008070522HLC.txt',
    'PPT_12_202008062343HLC.txt','PPT_12_202008070522HLC.txt'
]
patterns = (re.compile("PPT_6"), re.compile("PPT_12"))

final_list = [sorted(list(filter(pattern.match, problem_list)))[0] 
            for pattern in patterns]
final_list += list(filter(re.compile("[^PPT]").match, problem_list))

Depending on how many file names you're going to be working with, I don't think it should be too bad.

Upvotes: 0

thanasisp
thanasisp

Reputation: 5975

The datetime format into these filenames allows you not to use datetime functions, alphabetical order is enough.

You can remove all items matching the two patterns and finally append the most recent of them, which are the maximum (alphabetically) elements.

p1 = [x for x in my_list if x.startswith("PPT_6")]
p2 = [x for x in my_list if x.startswith("PPT_12")]

result = [x for x in my_list if x not in p1 and x not in p2]
result.append(max(p1))
result.append(max(p2))

print(result)

Upvotes: 1

jignatius
jignatius

Reputation: 6474

Since the file names already have a date order, you could simply sort on them. Then group by the prefix (PPT_6 and PPT_12). Finally get the top row from each group.

from itertools import groupby

#get prefix up to nth _
def split_nth(text, n):
    grp = text.split('_')
    return '_'.join(grp[:n])

my_list =['l.txt','PPT_6_202008062343HLC.txt','PPT_6_202008070522HLC.txt',
          'PPT_12_202008062343HLC.txt','PPT_12_202008070522HLC.txt']

sorted_list = sorted(my_list[1:], reverse=True)
groups = groupby(sorted_list, key=lambda x: split_nth(x, 2))
result = [next(v) for _, v in groups]
result.append(my_list[0])

Upvotes: 1

Related Questions