Reputation:
How can I find duplicates in array when there is more than one duplicated element?
When the array is only one duplicated element (for example: 1, 2, 3, 4, 4, 4, 5, 6, 7) then it is very easy:
int duplicate(int* a, int s)
{
int x = a[0];
for(int i = 1; i < s; ++i)
{
x = x ^ a[i];
}
for(int i = 0; i < a[s]; ++i)
{
x = x ^ i;
}
return x;
}
But if the input array contains more than one duplicated element (for example: 1, 2, 2, 2, 3, 4, 4, 4, 5, 6, 7), the above won't work. How can we solve this problem in O(n) time?
Upvotes: 0
Views: 1462
Reputation: 483
Using a set is one of the possible generic solutions. Example in c++:
template <typename T>
void filter_duplicates(T* arr, int length) {
std::unordered_set<T> set;
for (int i = 0; i < length; ++i) {
if (set.count(arr[i]) > 0) {
// then it's a duplicate
}
set.insert(arr[i]);
}
// the set contains all the items, unduplicated
}
As unordered_set
is implemented as a hash table, insertion and lookup are of amortized constant complexity. As a set can only contain unique keys, this effectively de-duplicates the items. We could finally convert back the set to an array. We could also use a map to count the occurrences.
If array elements are integers and that the maximum possible value is known, and fairly low, then the set can be replaced by a simple array either 1. of boolean or 2. of integer if we want to count the number of occurrences.
Upvotes: 1
Reputation: 10161
If space is no concern or the maximal number is quite low, you can simple use a kind of a bit-array and mark all already occurred numbers by setting the bit at the position of the number.
It'a a kind of HashSet with trivial (identity) hash-function.
Tests and set cost O(1)
time.
Upvotes: 1