Reputation: 43
Given an array/list
of integers
, output the duplicates.
Also, what I am really looking for: what solutions have best time performance? Best space performance? Is it possible to have both best time and best space performance? Just curious. Thank you!
For example: given the list [4,1,7,9,4,5,2,7,6,5,3,6,7]
, the answer would be [4,7,6,5]
(the order of the output does not matter).
I wrote up my solution in python
.
Here's one solution I wrote using a hash and binary search.
def binarySearch(array, number):
start = 0
end = len(array)
mid = (end + start) // 2
while (end > start):
mid = start + (end - start) // 2
if array[mid] == number:
return (mid, True)
elif number > array[mid]:
if start == mid:
return (mid + 1, False)
start = mid
else:
end = mid
return (mid, False)
def findDuplicatesWithHash(array):
duplicatesHash = {}
duplicates = []
for number in array:
try:
index,found = binarySearch(duplicates, number)
if duplicatesHash[number] == 0 and not found:
duplicates.insert(index, number)
except KeyError as error:
duplicatesHash[number] = 0
duplicatesSorted = sorted(duplicates, key=lambda tup: tup)
return duplicatesSorted
Upvotes: 3
Views: 2928
Reputation: 4820
There are multiple solutions to finding duplicates. Given this question is completely generic, one can assume that given a list of n
values, the number of duplicates lie in the range [0, n/2]
.
What are the possible methods you can think of?
Hash Table approach:
Store values while traversing the list if value already doesn't exist in the hash table. If the value, exists, you have a duplicate.
Algorithm FindDuplicates(list)
hash_table <- HashTable()
duplicates <- List()
for value in list:
if value in hash_table:
duplicates.add(value)
else:
hash_table.add(value, true)
O(n)
to traverse through all valuesO(n)
to save all possible values in the hash table.Sort Array
Sort the array and traverse neighbour values.
Algorithm FindDuplicates(list)
list.sort()
duplicates <- Set()
for i <- [1, len(list)-1]:
if list[i] = list[i-1]:
duplicates.add(list[i])
O(n.logn) + O(n) = O(n.logn)
to sort and traverse all valuesO(1)
as no extra space created to produce duplicatesCheck for every value
For every value check if the value exists in the array.
Algorithm Search(i, list):
for j <- [0, len(list)-1] - [i]:
if list[j] = list[i]:
return true
return false
Algorithm FindDuplicates(list)
duplicates <- Set()
for i <- [1, len(list)-1]:
if Search(i, list):
duplicates.add(list[i])
Time: O(n^2)
number of comparisons are n*n(-1)
Space: O(1)
as no extra space created to produce duplicates
Note: space for the duplicates array cannot be included in the space complexity equations as that is the result we want.
Can you think of some more?
Upvotes: 3
Reputation: 2134
One way to get the duplicate:
l = [4,1,7,9,4,5,2,7,6,5,3,6]
import collections
print([item for item, count in collections.Counter(l).items() if count > 1])
Upvotes: 1
Reputation: 2269
Finding duplicates is very similar to sorting. That is, each element needs to be directly or indirectly compared to all other elements to find if there are duplicates. One could modify quicksort to output elements that have an adjacent matching element with O(n) spacial complexity and O(n*log(n)) average time complexity.
Upvotes: 1