Eugene M
Eugene M

Reputation: 49035

In Python, how do I take a list and reduce it to a list of duplicates?

I have a list of strings that should be unique. I want to be able to check for duplicates quickly. Specifically, I'd like to be able to take the original list and produce a new list containing any repeated items. I don't care how many times the items are repeated so it doesn't have to have a word twice if there are two duplicates.

Unfortunately, I can't think of a way to do this that wouldn't be clunky. Any suggestions?

EDIT: Thanks for the answers and I thought I'd make a clarification. I'm not concerned with having a list of uniques for it's own sake. I'm generating the list based off of text files and I want to know what the duplicates are so I can go in the text files and remove them if any show up.

Upvotes: 0

Views: 1181

Answers (9)

marty
marty

Reputation: 21

Here's a simple 1-liner:

>>> l = ['a', 'a', 3, 'r', 'r', 's', 's', 2, 3, 't', 'y', 'a', 'w', 'r']
>>> [v for i, v in enumerate(l) if l[i:].count(v) > 1 and l[:i].count(v) == 0]
['a', 3, 'r', 's']

enumerate returns an indexed list which we use to splice our input list determining whether there are any duplicates ahead of our current index in the loop and whether we have already found a duplicate behind.

Upvotes: 2

mariotomo
mariotomo

Reputation: 9748

the solutions based on 'set' have a small drawback, namely they only work for hashable objects.

the solution based on itertools.groupby on the other hand works for all comparable objects (e.g.: dictionaries and lists).

Upvotes: 0

mhawke
mhawke

Reputation: 87134

Personally, I think this is the simplest way to do it with performance O(n). Similar to vartec's solution but no import required and no Python version dependencies to worry about:

def getDuplicates(iterable):
    d = {}
    for i in iterable:
        d[i] = d.get(i, 0) + 1
    return [i for i in d if d[i] > 1]

Upvotes: 0

Bite code
Bite code

Reputation: 597351

EDIT : Ok, doesn't work since you want duplicates only.

Whith python > 2.4 :

You have set, just do :

my_filtered_list = list(set(mylist))

Set is a data structure that doesn't have duplicate by nature.

With older Python versions :

my_filtered_list = list(dict.fromkeys(mylist).keys())

Dictionary map a unique key to a value. We use the "unique" caracteristc to get rid of the duplicate.

Upvotes: -1

unbeknown
unbeknown

Reputation:

If you don't care about the order of the duplicates:

a = [1, 2, 3, 4, 5, 4, 6, 4, 7, 8, 8]
b = sorted(a)
duplicates = set([x for x, y in zip(b[:-1], b[1:]) if x == y])

Upvotes: 1

John Montgomery
John Montgomery

Reputation: 9058

groupby from itertools will probably be useful here:


from itertools import groupby
duplicated=[k for (k,g) in groupby(sorted(l)) if len(list(g)) > 1]

Basically you use it to find elements that appear more than once...

NB. the call to sorted is needed, as groupby only works properly if the input is sorted.

Upvotes: 6

SilentGhost
SilentGhost

Reputation: 319929

Definitely not the fastest way to do that, but it seem to work solve the problem:

>>> lst = [23, 32, 23, None]
>>> set(i for i in lst if lst.count(i) > 1)
{23}

Upvotes: 3

Jason Coon
Jason Coon

Reputation: 18441

This will create the list in one line:

L = [1, 2, 3, 3, 4, 4, 4]
L_dup = set([i for i in L if L.count(i) > 1])

Upvotes: 4

Staale
Staale

Reputation: 28008

This code should work:

duplicates = set()
found = set()
for item in source:
    if item in found:
        duplicates.add(item)
    else:
        found.add(item)

Upvotes: 20

Related Questions