Reputation: 309
My question is similar to this, but instead of removing full duplicates I'd like to remove consecutive partial "duplicates" from a list in python.
For my particular use case, I want to remove words from a list that start consecutive with the same character, and I want to be able to define that character. For this example it's #
, so
['#python', 'is', '#great', 'for', 'handling',
'text', '#python', '#text', '#nonsense', '#morenonsense', '.']
should become
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
Upvotes: 5
Views: 350
Reputation: 148880
One single iteration is enough, provided you keep some context: the previous element and whether pre-previous was kept.
def filter_lst(lst, char):
res = [] # the future returned value
keep = True # initialize context
old = lst[0]
for word in lst[1:]: # and iterate (first element is already in old)
if old[0] != char or (keep and word[0] != char):
res.append(old)
keep = True
else:
keep = False
old = word
if keep or (old[0] != char): # don't forget last element!
res.append(old)
return res
It gives:
>>> lst = ['#python', 'is', '#great', 'for', 'handling',
'text', '#python', '#text', '#nonsense', '#morenonsense', '.']
>>> filter_lst(lst, '#')
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
Upvotes: 1
Reputation: 164623
Here's one solution using itertools.groupby
. The idea is to group items depending on whether the first character is equal to a given k
. Then apply your 2 criteria; if they are not satisfied, you can yield the items.
L = ['#python', 'is', '#great', 'for', 'handling', 'text',
'#python', '#text', '#nonsense', '#morenonsense', '.']
from itertools import chain, groupby
def list_filter(L, k):
grouper = groupby(L, key=lambda x: x[0]==k)
for i, j in grouper:
items = list(j)
if not (i and len(items) > 1):
yield from items
res = list_filter(L, '#')
print(list(res))
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
Upvotes: 3
Reputation: 82889
You could use itertools.groupby
:
>>> from itertools import groupby
>>> lst = ['#python', 'is', '#great', 'for', 'handling', 'text', '#python', '#text', '#nonsense', '#morenonsense', '.']
>>> [s for k, g in ((k, list(g)) for k, g in groupby(lst, key=lambda s: s.startswith("#")))
... if not k or len(g) == 1 for s in g]
...
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
This groups elements by whether they start with a #
, then uses only those elements that do not or where the group only has a single element.
Upvotes: 5