Reputation: 1597
I have the following list as input:
['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
In the output I want to exclude 'temp/date=22-07-2019/' since its a part of 'temp/date=22-07-2019/temp=22-07-2019/'. Hence the output should be:
['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
I have tried several ways but was not able to achieve this. Please suggest. Thanks
Upvotes: -1
Views: 164
Reputation: 8589
This solution is also taking care of identical duplicates creating a set:
example_data = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
# Creating a set we get rid of all identical entries
unique_data = set(example_data)
result = []
# Here we cycle through unique_data adding to result list all the long strings
# since we want to keep all the long entries
[result.append(item) for item in unique_data if len(item) > 21]
# Then we cycle again and take care of adding to result all the short strings that
# are not already contained in result items
for item in unique_data:
if len(item) == 21:
for element in result:
if item != element[:21]:
result.append(item)
break
# I am not sure you need to sort by date but this can be easily achieved with sorted
print(sorted(result))
Upvotes: 0
Reputation: 17884
You can use a dictionary:
lst = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
dct = {re.match(r'(temp/.+?)/.*', i).group(1): i for i in sorted(lst, key=len)}
# {'temp/date=20-07-2019': 'temp/date=20-07-2019/', 'temp/date=21-07-2019': 'temp/date=21-07-2019/', 'temp/date=22-07-2019': 'temp/date=22-07-2019/temp=22-07-2019/'}
print(list(dct.values()))
# ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
Upvotes: 0
Reputation: 92874
In case your items have specific format (temp/date=DD-MM-YY/
):
d = {}
lst = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/',
'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
for s in lst:
k = s[:21]
if k not in d or len(s) > len(d[k]):
d[k] = s
print(list(d.values()))
The output:
['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
Upvotes: 1
Reputation: 71461
You can use any
with a list comprehension:
r = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
result = [i for i in r if not any(i in c and len(c) > len(i) for c in r)]
Output:
['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
Upvotes: 3