Reputation: 49035
I have a list of strings that should be unique. I want to be able to check for duplicates quickly. Specifically, I'd like to be able to take the original list and produce a new list containing any repeated items. I don't care how many times the items are repeated so it doesn't have to have a word twice if there are two duplicates.
Unfortunately, I can't think of a way to do this that wouldn't be clunky. Any suggestions?
EDIT: Thanks for the answers and I thought I'd make a clarification. I'm not concerned with having a list of uniques for it's own sake. I'm generating the list based off of text files and I want to know what the duplicates are so I can go in the text files and remove them if any show up.
Upvotes: 0
Views: 1181
Reputation: 21
Here's a simple 1-liner:
>>> l = ['a', 'a', 3, 'r', 'r', 's', 's', 2, 3, 't', 'y', 'a', 'w', 'r']
>>> [v for i, v in enumerate(l) if l[i:].count(v) > 1 and l[:i].count(v) == 0]
['a', 3, 'r', 's']
enumerate
returns an indexed list which we use to splice our input list determining whether there are any duplicates ahead of our current index in the loop and whether we have already found a duplicate behind.
Upvotes: 2
Reputation: 9748
the solutions based on 'set' have a small drawback, namely they only work for hashable objects.
the solution based on itertools.groupby on the other hand works for all comparable objects (e.g.: dictionaries and lists).
Upvotes: 0
Reputation: 87134
Personally, I think this is the simplest way to do it with performance O(n). Similar to vartec's solution but no import
required and no Python version dependencies to worry about:
def getDuplicates(iterable):
d = {}
for i in iterable:
d[i] = d.get(i, 0) + 1
return [i for i in d if d[i] > 1]
Upvotes: 0
Reputation: 597351
EDIT : Ok, doesn't work since you want duplicates only.
You have set, just do :
my_filtered_list = list(set(mylist))
Set is a data structure that doesn't have duplicate by nature.
my_filtered_list = list(dict.fromkeys(mylist).keys())
Dictionary map a unique key to a value. We use the "unique" caracteristc to get rid of the duplicate.
Upvotes: -1
Reputation:
If you don't care about the order of the duplicates:
a = [1, 2, 3, 4, 5, 4, 6, 4, 7, 8, 8]
b = sorted(a)
duplicates = set([x for x, y in zip(b[:-1], b[1:]) if x == y])
Upvotes: 1
Reputation: 9058
groupby
from itertools will probably be useful here:
from itertools import groupby
duplicated=[k for (k,g) in groupby(sorted(l)) if len(list(g)) > 1]
Basically you use it to find elements that appear more than once...
NB. the call to sorted
is needed, as groupby
only works properly if the input is sorted.
Upvotes: 6
Reputation: 319929
Definitely not the fastest way to do that, but it seem to work solve the problem:
>>> lst = [23, 32, 23, None]
>>> set(i for i in lst if lst.count(i) > 1)
{23}
Upvotes: 3
Reputation: 18441
This will create the list in one line:
L = [1, 2, 3, 3, 4, 4, 4]
L_dup = set([i for i in L if L.count(i) > 1])
Upvotes: 4
Reputation: 28008
This code should work:
duplicates = set()
found = set()
for item in source:
if item in found:
duplicates.add(item)
else:
found.add(item)
Upvotes: 20