Reputation: 35
So I'm taking an intro computer science course right now, and I was wondering how to check if there were any duplicates within multiple lists. I've read up on these answers:
How can I compare two lists in python and return matches and How to find common elements in list of lists?
However, they're not quite what I'm looking for. Say for example I have this list of lists:
list_x = [[66,76],
[25,26,27],
[65,66,67,68],
[40,41,42,43,44],
[11,21,31,41,51,61]]
There are two sets of duplicates (66 and 41), although that doesn't really matter to me. Is there a way to find if the duplicates exist? What I'm looking for is if there are duplicates, the function will return True (or False, depending on what I want to do with the lists). I get the impression that I should use sets (which we have not learned about so I looked up on the internet), use for loops, or write my own function. If it's the case that I'll need to write my own function, please let me know, and I'll edit with an attempt later today!
Upvotes: 1
Views: 318
Reputation:
A very simple solution would be to use a list comprehension to first flatten the list and then afterwards use set
and len
together to test for any duplicates:
>>> list_x = [[66,76],
... [25,26,27],
... [65,66,67,68],
... [40,41,42,43,44],
... [11,21,31,41,51,61]]
>>> flat = [y for x in list_x for y in x]
>>> flat # Just to demonstrate
[66, 76, 25, 26, 27, 65, 66, 67, 68, 40, 41, 42, 43, 44, 11, 21, 31, 41, 51, 61]
>>> len(flat) != len(set(flat)) # True because there are duplicates
True
>>>
>>> # This list has no duplicates...
... list_x = [[1, 2],
... [3, 4, 5],
... [6, 7, 8, 9],
... [10, 11, 12, 13],
... [14, 15, 16, 17, 18]]
>>> flat = [y for x in list_x for y in x]
>>> len(flat) != len(set(flat)) # ...so this is False
False
>>>
Be warned however that this approach will be somewhat slow if list_x
is large. If performance is a concern, then you can use a lazy approach which utilizes a generator expression, any
, and set.add
:
>>> list_x = [[66,76],
... [25,26,27],
... [65,66,67,68],
... [40,41,42,43,44],
... [11,21,31,41,51,61]]
>>> seen = set()
>>> any(y in seen or seen.add(y) for x in list_x for y in x)
True
>>>
Upvotes: 3
Reputation: 6386
Here is more straightforward solution with sets:
list_x = [[66,76],
[25,26,27],
[65,66,67,68],
[40,41,42,43,44],
[11,21,31,41,51,61]]
seen = set()
duplicated = set()
for lst in list_x:
numbers = set(lst) # only unique
# make intersection with seen and add to duplicated:
duplicated |= numbers & seen
# add numbers to seen
seen |= numbers
print duplicated
for information about set
and its operations,see docs: https://docs.python.org/2/library/stdtypes.html#set
Upvotes: 0
Reputation: 1124558
Iterate and use a set to detect if there are duplicates:
seen = set()
dupes = [i for lst in list_x for i in lst if i in seen or seen.add(i)]
This makes use of the fact that seen.add()
returns None
. A set
is a unordered collection of unique values; the i in seen
test is True
if i
is already part of the set.
Demo:
>>> list_x = [[66,76],
... [25,26,27],
... [65,66,67,68],
... [40,41,42,43,44],
... [11,21,31,41,51,61]]
>>> seen = set()
>>> [i for lst in list_x for i in lst if i in seen or seen.add(i)]
[66, 41]
Upvotes: 1