Reputation: 1320
Given a Python list, I want to remove consecutive 'duplicates'. The duplicate value however is a attribute of the list item (In this example, the tuple
's first element).
Input:
[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
Desired Output:
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]
Cannot use set
or dict
, because order is important.
Cannot use list comprehension [x for x in somelist if not determine(x)]
, because the check depends on predecessor.
What I want is something like:
mylist = [...]
for i in range(len(mylist)):
if mylist[i-1].attr == mylist[i].attr:
mylist.remove(i)
What is the preferred way to solve this in Python?
Upvotes: 19
Views: 1546
Reputation: 2407
It's somewhat overkill but you can use 'reduce',too:
from functools import reduce
data=[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
reduce(lambda rslt,t: rslt if rslt[-1][0]==t[0] else rslt+[t], data, [data[0]])
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]
Upvotes: 1
Reputation: 1662
You could also use enumerate
and a list comprehension:
>>> data = [(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
>>> [v for ix, v in enumerate(data) if not ix or v[0] != data[ix-1][0]]
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]
Upvotes: 2
Reputation: 21
If you just want to stick to list comprehension, you can use something like this:
>>> li = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
>>> [li[i] for i in range(len(li)) if not i or li[i] != li[i-1]]
[(1, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
Please not that not i
is the pythonic way of writing i == 0
.
Upvotes: 2
Reputation: 8180
You can easily zip
the list with itself. Every element, except the first one, is zipped with its predecessor:
>>> L = [(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
>>> list(zip(L[1:], L))
[((2, 'b'), (1, 'a')), ((2, 'b'), (2, 'b')), ((2, 'c'), (2, 'b')), ((3, 'd'), (2, 'c')), ((2, 'e'), (3, 'd'))]
The first element is always part of the result, and then you filter the pairs on the condition and return the first element:
>>> [L[0]]+[e for e, f in zip(L[1:], L) if e[0]!=f[0]]
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]
Upvotes: 1
Reputation: 19885
You can use itertools.groupby
(demonstration with more data):
from itertools import groupby
from operator import itemgetter
data = [(1, 'a'), (2, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (3, 'a')]
[next(group) for key, group in groupby(data, key=itemgetter(0))]
Output:
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (2, 'a'), (3, 'a')]
For completeness, an iterative approach based on other answers:
result = []
for first, second in zip(data, data[1:]):
if first[0] != second[0]:
result.append(first)
result
Output:
[(1, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a')]
Note that this keeps the last duplicate, instead of the first.
Upvotes: 17
Reputation: 22493
If I am not mistaken, you only need to lookup the last value.
test = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a'),(3, 'a'),(4,"a"),(4,"a")]
result = []
for i in test:
if result and i[0] == result[-1][0]: #edited since OP considers (1,"a") and (1,"b") as duplicate
#if result and i == result[-1]:
continue
else:
result.append(i)
print (result)
Output:
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (3, 'a'), (4, 'a')]
Upvotes: 7
Reputation: 523
I'd change Henry Yik's proposal a little bit, making it a bit simpler. Not sure if I am missing something.
inputList = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
outputList = []
lastItem = None
for item in inputList:
if not item == lastItem:
outputList.append(item)
lastItem = item
print(outputList)
Upvotes: 1
Reputation: 88236
In order to remove consecutive duplicates, you could use itertools.groupby
:
l = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
from itertools import groupby
[tuple(k) for k, _ in groupby(l)]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
Upvotes: 11