Filtering a list containing date Strings

Question

Below is an example list where each element has a name (for example XXX) and associated date(for example 20200115)

[XXX_20200115, XXX_20200116, YYY_20200116, ZZZ_20200116, ZZZ_20200117]

I want to remove all the elements from the list which have same name but an old date. For example, I want to remove XXX_20200115 because XXX_20200116 already exists with latest date.

so my final output should be :

[ XXX_20200116, YYY_20200116, ZZZ_20200117]

Till now I wrote this code:

from collections import defaultdict

def list_duplicates(seq):
    tally = defaultdict(list)
    for i,item in enumerate(seq):
        tally[item].append(i)
    return ((key,locs) for key,locs in tally.items() 
                            if len(locs)>1)


def filterModules(mylist):
    names = []
    timestamps =  []

    for module in mylist:
         splittedName = module.rsplit('_', 1)
         names.append(splittedName[0])
         timestamps.append(splittedName[1])

    duplicates = []
    for dup in sorted(list_duplicates(names)):
        duplicate = {}
        duplicate['name'] = dup[0]
        duplicate['indexs'] = dup[1]
        duplicates.append(duplicate)

which gives me duplicates with their indexes.

My issue is, I was aiming to write a minimum code for this problem but my code is getting bigger and seems like I am approaching this problem in inefficient way. Can someone tell me more optimum way of solving this problem and with minimal code?

Vikas Periyadath · Accepted Answer

First grouping the elements based on the first 3 letters and from the sub list take max(). If it is string also python will take max based on integers:

from itertools import groupby

l1 = ["XXX_20200115", "XXX_20200116", "YYY_20200116", "ZZZ_20200116", "ZZZ_20200117"]

l2 = [list(g) for k, g in groupby(l1, key=lambda x: x.split("_")[0])]

new_l = [max(i) for i in l2]

print(new_l)

Filtering a list containing date Strings

Answers (2)

Related Questions