Reputation: 686
Below is an example list where each element has a name (for example XXX
) and associated date(for example 20200115
)
[XXX_20200115, XXX_20200116, YYY_20200116, ZZZ_20200116, ZZZ_20200117]
I want to remove all the elements from the list which have same name but an old date. For example, I want to remove XXX_20200115
because XXX_20200116
already exists with latest date.
so my final output should be :
[ XXX_20200116, YYY_20200116, ZZZ_20200117]
Till now I wrote this code:
from collections import defaultdict
def list_duplicates(seq):
tally = defaultdict(list)
for i,item in enumerate(seq):
tally[item].append(i)
return ((key,locs) for key,locs in tally.items()
if len(locs)>1)
def filterModules(mylist):
names = []
timestamps = []
for module in mylist:
splittedName = module.rsplit('_', 1)
names.append(splittedName[0])
timestamps.append(splittedName[1])
duplicates = []
for dup in sorted(list_duplicates(names)):
duplicate = {}
duplicate['name'] = dup[0]
duplicate['indexs'] = dup[1]
duplicates.append(duplicate)
which gives me duplicates with their indexes.
My issue is, I was aiming to write a minimum code for this problem but my code is getting bigger and seems like I am approaching this problem in inefficient way. Can someone tell me more optimum way of solving this problem and with minimal code?
Upvotes: 0
Views: 223
Reputation: 3186
First grouping the elements based on the first 3 letters and from the sub list take max()
. If it is string also python will take max based on integers:
from itertools import groupby
l1 = ["XXX_20200115", "XXX_20200116", "YYY_20200116", "ZZZ_20200116", "ZZZ_20200117"]
l2 = [list(g) for k, g in groupby(l1, key=lambda x: x.split("_")[0])]
new_l = [max(i) for i in l2]
print(new_l)
Upvotes: 3
Reputation: 1098
I tried utilizing dictionary for this where XXX
, YYY
etc can be keys and values can be dates. This is how implementation will look.
dt = ['XXX_20200115', 'XXX_20200116', 'YYY_20200116', 'ZZZ_20200116', 'ZZZ_20200117']
dt = [tuple(i.split('_'))for i in dt]
new_dt = {}
for i,j in dt:
if i not in new_dt.keys():
new_dt[i]=j
else:
if j>new_dt[i]:
new_dt[i]=j
print(new_dt)
Which will give
{'XXX': '20200116', 'YYY': '20200116', 'ZZZ': '20200117'}
Finally if you want to convert to original format when you can just append key and value and make it an array with
new_dt = ["{}_{}".format(i,new_dt[i]) for i in new_dt]
This will give
['XXX_20200116', 'YYY_20200116', 'ZZZ_20200117']
Upvotes: 0