Reputation: 271774

How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 

]

I would like to remove the THIRD element inside the list...because it has "u2.com" as a duplicate. I don't want duplicate "link" element. What is the most efficient code to do this so that it results in this:

biglist = 

[ 

    {'title':'U2','link':'u2.com'}, 
    {'title':'ABC','link':'abc.com'}
]

I have tried many ways, including using many nested "for ...in ...." but this is very inefficient and too long.

Upvotes: 1

Answers (5)

Ian Clelland

Reputation: 44132

Make a new dictionary, with 'u2.com' and 'abc.com' as the keys, and your list elements as the values. The dictionary will enforce uniqueness. Something like this:

uniquelist = dict((element['link'], element) for element in reversed(biglist))

(The reversed is there so that the first elements in the list will be the ones that remain in the dictionary. If you take that out, then you will get the last element instead).

Then you can get elements back into a list like this:

biglist = uniquelist.values()

Upvotes: 3

Imran

Reputation: 91009

You can use defaultdict to group items by link, then removed duplicates if you want to.

from collections import defaultdict

nodupes = defaultdict(list)
for d in biglist:
    nodupes[d['url']].append(d['title']

This will give you:

defaultdict(<type 'list'>, {'abc.com': ['ABC Station'], 'u2.com': ['U2 Band', 
'Live Concert by U2']})

Upvotes: 0

Alex Martelli

Reputation: 881675

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

Upvotes: 8

steveha

Reputation: 76715

biglist = \
[ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

def dedupe(lst):
    d = {}
    for x in lst:
        link = x["link"]
        if link in d:
            continue
        d[link] = x
    return d.values()

lst = dedupe(biglist)

dedupe() keeps the first of any duplicates.

Upvotes: 1

dcrosta

Reputation: 26258

You can sort the list, using the link field of each dictionary as the sort key, then iterate through the list once and remove duplicates (or rather, create a new list with duplicates removed, as is the Python idiom), like so:

# sort the list using the 'link' item as the sort key
biglist.sort(key=lambda elt: elt['link'])

newbiglist = []
for item in biglist:
    if newbiglist == [] or item['link'] != newbiglist[-1]['link']:
        newbiglist.append(item)

This code will give you the first element (relative ordering in the original biglist) for any group of "duplicates". This is true because the .sort() algorithm used by Python is guaranteed to be a stable sort -- it does not change the order of elements determined to be equal to one another (in this case, elements with the same link).

Upvotes: 2

How to remove these duplicates in a list (python)

Answers (5)

Related Questions