OhaiMac
OhaiMac

Reputation: 43

Find duplicates in a array/list of integers

Given an array/list of integers, output the duplicates.

Also, what I am really looking for: what solutions have best time performance? Best space performance? Is it possible to have both best time and best space performance? Just curious. Thank you!

For example: given the list [4,1,7,9,4,5,2,7,6,5,3,6,7], the answer would be [4,7,6,5] (the order of the output does not matter).

I wrote up my solution in python.

Here's one solution I wrote using a hash and binary search.

def binarySearch(array, number):
    start = 0
    end = len(array)
    mid = (end + start) // 2
    while (end > start):
        mid = start + (end - start) // 2
        if array[mid] == number:
            return (mid, True)
        elif number > array[mid]:
            if start == mid:
                return (mid + 1, False)
                start = mid
            else:
                end = mid

    return (mid, False)

def findDuplicatesWithHash(array):
    duplicatesHash = {}
    duplicates = []
    for number in array:
        try:
            index,found = binarySearch(duplicates, number)
            if duplicatesHash[number] == 0 and not found: 
                duplicates.insert(index, number)
        except KeyError as error:
            duplicatesHash[number] = 0

    duplicatesSorted = sorted(duplicates, key=lambda tup: tup)
    return duplicatesSorted

Upvotes: 3

Views: 2928

Answers (3)

p0lAris
p0lAris

Reputation: 4820

There are multiple solutions to finding duplicates. Given this question is completely generic, one can assume that given a list of n values, the number of duplicates lie in the range [0, n/2].

What are the possible methods you can think of?

  1. Hash Table approach:

    Store values while traversing the list if value already doesn't exist in the hash table. If the value, exists, you have a duplicate.

    Algorithm FindDuplicates(list)
    hash_table <- HashTable()
    duplicates <- List()
    for value in list:
        if value in hash_table:
            duplicates.add(value)
        else:
            hash_table.add(value, true)
    
    • Time: O(n) to traverse through all values
    • Space: O(n) to save all possible values in the hash table.
  2. Sort Array

    Sort the array and traverse neighbour values.

    Algorithm FindDuplicates(list)
    list.sort()
    duplicates <- Set()
    for i <- [1, len(list)-1]:
        if list[i] = list[i-1]:
            duplicates.add(list[i])
    
    • Time: O(n.logn) + O(n) = O(n.logn) to sort and traverse all values
    • Space: O(1) as no extra space created to produce duplicates
  3. Check for every value

    For every value check if the value exists in the array.

    Algorithm Search(i, list):
        for j <- [0, len(list)-1] - [i]:
            if list[j] = list[i]:
                return true
        return false
    
    Algorithm FindDuplicates(list)
    duplicates <- Set()
    for i <- [1, len(list)-1]:
        if Search(i, list):
            duplicates.add(list[i])
    

    Time: O(n^2) number of comparisons are n*n(-1) Space: O(1) as no extra space created to produce duplicates

Note: space for the duplicates array cannot be included in the space complexity equations as that is the result we want.

Can you think of some more?

Upvotes: 3

GAVD
GAVD

Reputation: 2134

One way to get the duplicate:

l = [4,1,7,9,4,5,2,7,6,5,3,6]
import collections

print([item for item, count in collections.Counter(l).items() if count > 1])

Upvotes: 1

Alden
Alden

Reputation: 2269

Finding duplicates is very similar to sorting. That is, each element needs to be directly or indirectly compared to all other elements to find if there are duplicates. One could modify quicksort to output elements that have an adjacent matching element with O(n) spacial complexity and O(n*log(n)) average time complexity.

Upvotes: 1

Related Questions