Ram
Ram

Reputation: 557

Python: Find list elements with part of it as a duplicate. A logic to work would be sufficient

List1 = ['ABCD_123.A_062320_082824', 'ABCD_123.A_062320_094024','ABCD_123.A_063020_084447']

I want to keep the last element as it has the latest time stamp MonDayYear_HrMinSec


Method 1

names = []
for name in list1:
         names.append(name.split('_')[0])
         Day = name.split('_')[-2]
         Time = name.split('_')[-1]
     
     print(names,Day,Time)

Method 2 
     for name in list1: 
         namematch = re.search(r'^([a-zA-Z0-9]*)(__[\d]*.A_)([\d]{6})_([\d]{6})',name)
         names.append(namematch.group(1))
         
     #print(names)

I tried regex which works but I dont know how to check for corresponding group. DO I use an if condition checking for group 2 and 3 and keep group1 or something along those lines?

Upvotes: 2

Views: 59

Answers (1)

Ehsan
Ehsan

Reputation: 12397

You want this (assuming structure is name_date_time):

from itertools import groupby
out = [sorted(list(v))[-1] for k,v in groupby(sorted(List1), key=lambda x: '_'.join(x.split('_')[:-2]))]

Explanation:

  • Split your elements by '_' and throw away date and time and join the rest by '_' to form the names
  • Use groupby to group by names and then sort each group
  • Select the last in the sorted group (if you sort, latest date and time will come last)

output (Note that the order of elements can be different in this solution. If you need to keep the order, simply keep the order of names and reorder this by that):

['ABCD_123.A_063020_084447']

Another example:

List1 = ['ABCE_123.A_062320_082824', 'ABCE_123.A_062320_094024','ABCD_123.A_063020_084447']

out:

['ABCD_123.A_063020_084447', 'ABCE_123.A_062320_094024']

Upvotes: 1

Related Questions