Ali
Ali

Reputation: 5476

Removing duplicates from an array using std::map

I'm directly posting my code which I've written on collabedit under 5 minutes (including figuring out the algorithm) thus even though with the risk of completely made of fun in terms of efficiency I wanted to ask my fellow experienced stack overflow algorithm enthusiasts about the problem;

Basically removing duplicate elements from an array. My Approach: Basically using the std::map as my hash table and for each element in duplicated array if the value has not been assigned add it to our new array. If assigned just skip. At the end return the unique array. Here is my code and the only thing I'm asking in terms of an interview question can my solution be more efficient?

#include <iostream>
#include <vector>
#include <map>

using namespace std;

vector<int>uniqueArr(int arr[],int size){
    std::map<int,int>storedValues;
    vector<int>uniqueArr;
    for(int i=0;i<size;i++){
        if(storedValues[arr[i]]==0){
            uniqueArr.push_back(arr[i]);
            storedValues[arr[i]]=1;
        }
    }
    return uniqueArr;  
}

int main()
{   
    const int size=10;
    int arr[size]={1,2,2,4,2,5,6,5,7,1};
    vector<int>uniArr=uniqueArr(arr,size);
    cout<<"Result: ";
    for(int i=0;i<uniArr.size();i++) cout<<uniArr[i]<<" ";
    cout<<endl;
    return 0;
}

Upvotes: 0

Views: 5117

Answers (4)

Christian Rau
Christian Rau

Reputation: 45948

First of all, there is no need for a map, a set is conceptually more correct, since you don't want to store any values, but only the keys.

Performance-wise, it might be a better idea to use a std::unordered_set instead of a std::set, as the former is hashed and can give you O(1) insert and lookup in best case, whereas the latter is a binary search tree, giving you only O(log n) access.

vector<int> uniqueArr(int arr[], int size)
{
    std::unordered_set<int> storedValues;
    vector<int> uniqueArr;
    for(int i=0; i<size; ++i){
        if(storedValues.insert(arr[i]).second)
            uniqueArr.push_back(arr[i]);
    return uniqueArr;  
}

But if you are allowed to use the C++ standard library more extensively, you may also consider the other answers using std::sort and std::unique, although they are O(n log n) (instead of the above ~O(n) solution) and destroy the order of the elements.


If you want to use a more flexible and std-driven approach but with ~O(n) complexity and without destroying the order of the elements, you can transform the above routine into the following std-like algorithm, even if being a bit too far-fetched for a simple interview question:

template<typename ForwardIterator>
ForwardIterator unordered_unique(ForwardIterator first, ForwardIterator last)
{
    typedef typename std::iterator_traits<ForwardIterator>::value_type value_type;
    std::unordered_set<value_type> unique;
    return std::remove_if(first, last, 
                          [&unique](const value_type &arg) mutable -> bool
                              { return !unique.insert(arg).second; });
}

Which you can then apply like std::unique in the usual erase-remove way:

std::vector<int> values(...);
values.erase(unordered_unique(values.begin(), values.end()), values.end());

To remove the unique values without copying the vector and without needing to sort it beforehand.

Upvotes: 4

Tony Delroy
Tony Delroy

Reputation: 106086

In-place removal's nice for speed - something like this (returning the new size):

template <typename T, size_t N>
size_t keep_unique(T (&array)[N])
{
    std::unordered_set<T> found;
    for (size_t i = 0, j = 0; i < N; ++i)
        if (found.insert(array[i]).second))
            if (j != i) // (optional) avoid copy to self, as may be slower or unsupported by T
                array[j++] = array[i];
            else
                ++j;
    return j;
}

(For larger objects, or those that can't be safely copied, may be necessary and/or faster and more space efficient to store T*s in the unordered_set - must also provide a dereferencing comparison operator and hash function.)

To visualise how this works, consider processing the following input:

1  3  6  3  5  6  0  2  1
         <--+<----+  |
               <-----+

The arrows above represent the minimal in-place compaction necessary to produce the answer:

1  3  6  5  0  2

That's precisely what the algorithm above does, looking at all the elements at [i], and keeping track of where they need to be copied to (and how many non-duplicates there are) in [j].

Upvotes: 1

K-ballo
K-ballo

Reputation: 81349

Since you are asking in terms of an interview question, I will say that you don't get the job.

const int size=10;
int arr[size]={1,2,2,4,2,5,6,5,7,1};

std::sort( &arr[0], &arr[size] );
int* new_end = std::unique( &arr[0], &arr[size] );

std::copy(
    &arr[0], new_end,
  , std::ostream_iterator< int >( std::cout, " " )
);

No temporary maps, no temporary vectors, no dynamic memory allocations, a lot less code written so its easier both to write and to mantain.

Upvotes: 2

DEgITx
DEgITx

Reputation: 980

#include <algorithm>
#include <vector>

int main()
{
    std::vector<int> vec({1,2,3,2,4,4,5,7,6,6});
    std::sort(vec.begin(), vec.end());
    vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
    // vec = {1,2,3,4,5,6,7}
    return 0;
}
//works with C++11
// O(n log n)

Upvotes: 1

Related Questions