user2120121
user2120121

Reputation: 655

Algorithm - What is the best algorithm for detecting duplicate numbers in small array?

What is the best algorithm for detecting duplicate numbers in array, the best in speed, memory and avoiving overhead. Small Array like [5,9,13,3,2,5,6,7,1] Note that 5 i dublicate.

After searching and reading about sorting algorithms, I realized that I will use one of these algorithms, Quick Sort, Insertion Sort or Merge Sort.

But actually I am really confused about what to use in my case which is a small array.

Thanks in advance.

Upvotes: 1

Views: 341

Answers (2)

paxdiablo
paxdiablo

Reputation: 881563

To be honest, with that size of array, you may as well choose the O(n2) solution (checking every element against every other element).

You'll generally only need to worry about performance if/when the array gets larger. For small data sets like this, you could well have found the duplicate with an 'inefficient' solution before the sort phase of an efficient solution will have finished :-)

In other words, you can use something like (pseudo-code):

for idx1 = 0 to nums.len - 2 inclusive:
    for idx2 = idx1 + 1 to nums.len - 1 inclusive:
        if nums[idx1] == nums[idx2]:
            return nums[idx1]
return no dups found

This finds the first value in the array which has a duplicate.

If you want an exhaustive list of duplicates, then just add the duplicate value to another (initially empty) array (once only per value) and keep going.

You can sort it using any half-decent algorithm though, for a data set of the size you're discussing, even a bubble sort would probably be adequate. Then you just process the sorted items sequentially, looking for runs of values but it's probably overkill in your case.

Upvotes: 4

skypjack
skypjack

Reputation: 50550

Two good approaches depend on the fact that you know or not the range from which numbers are picked up.

Case 1: the range is known.

Suppose you know that all numbers are in the range [a, b[, thus the length of the range is l=b-a.

You can create an array A the length of which is l and fill it with 0s, thus iterate over the original array and for each element e increment the value of A[e-a] (here we are actually mapping the range in [0,l[).

Once finished, you can iterate over A and find the duplicate numbers. In fact, if there exists i such that A[i] is greater than 1, it implies that i+a is a repeated number.

The same idea is behind counting sort, and it works fine also for your problem.

Case 2: the range is not known.

Quite simple. Slightly modify the approach above mentioned, instead of an array use a map where the keys are the number from your original array and the values are the times you find them. At the end, iterate over the set of keys and search those that have been found more then once.

Note.

In both the cases above mentioned, the complexity should be O(N) and you cannot do better, for you have at least to visit all the stored values. Look at the first example: we iterate over two arrays, the lengths of which are N and l<=N, thus the complexity is at max 2*N, that is O(N). The second example is indeed a bit more complex and dependent on the implementation of the map, but for the sake of simplicity we can safely assume that it is O(N).

In memory, you are constructing data structures the sizes of which are proportional to the number of different values contained in the original array.

As it usually happens, memory occupancy and performance are the keys of your choice. Greater the former, better the latter and vice versa. As suggested in another response, if you know that the array is small, you can safely rely on an algorithm the complexity of which is O(N^2), but that does not require memory at all.

Which is the best choice? Well, it depends on your problem, we cannot say.

Upvotes: 0

Related Questions