remove later strings starting with a certain thing in a list python

Question

I have a list like this:

['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']

I want to remove all of the strings that occur after a string that starts with the same 4 characters as it. For example, 'a b e' would be removed because 'a b d' occurs before it.

The new list should look like this:

['a b d', 'c d j', 'w x y']

How can I do this?

(NOTE: The list is sorted, as per @Martijn Pieters' comment)

Martijn Pieters · Accepted Answer

Using a generator function to remember the starts:

def remove_starts(lst):
    seen = []
    for elem in lst:
        if elem.startswith(tuple(seen)):
            continue
        yield elem
        seen.append(elem[:4])

So the function skips anything that starts with one of the strings in seen, adding the first 4 characters of anything it does allow through to that set.

Demo:

>>> lst = ['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
>>> def remove_starts(lst):
...     seen = []
...     for elem in lst:
...         if elem.startswith(tuple(seen)):
...             continue
...         yield elem
...         seen.append(elem[:4])
...
>>> list(remove_starts(lst))
['a b d', 'c d j', 'w x y']

If your input is sorted, this can be simplified to:

def remove_starts(lst):
    seen = ()
    for elem in lst:
        if elem.startswith(seen):
            continue
        yield elem
        seen = elem[:4]

This saves on prefix-testing by limiting to just the last one.

remove later strings starting with a certain thing in a list python

Answers (2)

Related Questions