Reputation: 398
I have a list like this:
['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
I want to remove all of the strings that occur after a string that starts with the same 4 characters as it. For example, 'a b e'
would be removed because 'a b d'
occurs before it.
The new list should look like this:
['a b d', 'c d j', 'w x y']
How can I do this?
(NOTE: The list is sorted, as per @Martijn Pieters' comment)
Upvotes: 3
Views: 126
Reputation: 180401
You could also use an OrderedDict
, the keys can be the first four chars where the values will be the first string that contains those four characters:
lst = ['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
from collections import OrderedDict
print(list(OrderedDict((s[:4], s) for s in lst).values()))
['a b e', 'c d j', 'w x k']
Upvotes: 2
Reputation: 1121744
Using a generator function to remember the starts:
def remove_starts(lst):
seen = []
for elem in lst:
if elem.startswith(tuple(seen)):
continue
yield elem
seen.append(elem[:4])
So the function skips anything that starts with one of the strings in seen
, adding the first 4 characters of anything it does allow through to that set.
Demo:
>>> lst = ['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
>>> def remove_starts(lst):
... seen = []
... for elem in lst:
... if elem.startswith(tuple(seen)):
... continue
... yield elem
... seen.append(elem[:4])
...
>>> list(remove_starts(lst))
['a b d', 'c d j', 'w x y']
If your input is sorted, this can be simplified to:
def remove_starts(lst):
seen = ()
for elem in lst:
if elem.startswith(seen):
continue
yield elem
seen = elem[:4]
This saves on prefix-testing by limiting to just the last one.
Upvotes: 6