TIMEX
TIMEX

Reputation: 271674

Remove duplicates in a list while keeping its order (Python)

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

Upvotes: 27

Views: 26968

Answers (9)

falco
falco

Reputation: 11

Try this :

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']

Upvotes: 1

egafni
egafni

Reputation: 1988

use set(), then re-sort using the index of the original list.

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']

Upvotes: 39

rools
rools

Reputation: 1675

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]

Upvotes: 3

Tarnay Kálmán
Tarnay Kálmán

Reputation: 7066

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

Upvotes: 3

Jochen Ritzel
Jochen Ritzel

Reputation: 107598

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )

Upvotes: 13

Alex Martelli
Alex Martelli

Reputation: 881555

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not keep the "order"

  • my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

Upvotes: 26

Peter
Peter

Reputation: 132197

dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

Upvotes: 1

ABentSpoon
ABentSpoon

Reputation: 5169

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

Upvotes: 0

Greg Hewgill
Greg Hewgill

Reputation: 992857

A super easy way to do this is:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

  • it searches through the whole list for every element in the list, so it's O(n^2)
  • it's recursive so uses a stack depth equal to the length of the list

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

Upvotes: 0

Related Questions