Reputation: 271674
This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)
biglist =
[
{'title':'U2 Band','link':'u2.com'},
{'title':'Live Concert by U2','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'}
]
In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.
Upvotes: 27
Views: 26968
Reputation: 11
Try this :
list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
if i not in uniq:
uniq.append(i)
print list
print uniq
output will be :
['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']
Upvotes: 1
Reputation: 1988
use set(), then re-sort using the index of the original list.
>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']
Upvotes: 39
Reputation: 1675
This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):
mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]
And in the context of the answer:
[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]
Upvotes: 3
Reputation: 7066
This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark
The recommended* method:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
f5(biglist,lambda x: x['link'])
*by that page
Upvotes: 3
Reputation: 107598
Generators are great.
def unique( seq ):
seen = set()
for item in seq:
if item not in seen:
seen.add( item )
yield item
biglist[:] = unique( biglist )
Upvotes: 13
Reputation: 881555
My answer to your other question, which you completely ignored!, shows you're wrong in claiming that
The answers of that question did not keep the "order"
Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:
biglist = [
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]
known_links = set()
newlist = []
for d in biglist:
link = d['link']
if link in known_links: continue
newlist.append(d)
known_links.add(link)
biglist[:] = newlist
Upvotes: 26
Reputation: 132197
dups = {}
newlist = []
for x in biglist:
if x['link'] not in dups:
newlist.append(x)
dups[x['link']] = None
print newlist
produces
[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]
Note that here I used a dictionary. This makes the test not in dups
much more efficient than using a list.
Upvotes: 1
Reputation: 5169
I think using a set should be pretty efficent.
seen_links = set()
for index in len(biglist):
link = biglist[index]['link']
if link in seen_links:
del(biglist[index])
seen_links.add(link)
I think this should come in at O(nlog(n))
Upvotes: 0
Reputation: 992857
A super easy way to do this is:
def uniq(a):
if len(a) == 0:
return []
else:
return [a[0]] + uniq([x for x in a if x != a[0]])
This is not the most efficient way, because:
However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.
Upvotes: 0