Kashish Arora
Kashish Arora

Reputation: 908

Find uncommon elements using hashing

I think this is a fairly common question but I didn't find any answer for this using hashing in C++.

I have two arrays, both of the same lengths, which contain some elements, for example:

A={5,3,5,4,2}
B={3,4,1,2,1}

Here, the uncommon elements are: {5,5,1,1}

I have tried this approach- iterating a while loop on both the arrays after sorting:

while(i<n && j<n) {
    if(a[i]<b[j])
            uncommon[k++]=a[i++];
    else if (a[i] > b[j])
            uncommon[k++]=b[j++];
    else {    
            i++;
            j++;
    }
}
while(i<n && a[i]!=b[j-1])
    uncommon[k++]=a[i++];
while(j < n && b[j]!=a[i-1])
    uncommon[k++]=b[j++];

and I am getting the correct answer with this. However, I want a better approach in terms of time complexity since sorting both arrays every time might be computationally expensive.

I tried to do hashing but couldn't figure it out entirely.

To insert elements from arr1[]:

set<int> uncommon; 
    for (int i=0;i<n1;i++) 
        uncommon.insert(arr1[i]); 

To compare arr2[] elements:

    for (int i = 0; i < n2; i++) 
        if (uncommon.find(arr2[i]) != uncommon.end()) 

Now, what I am unable to do is to send only those elements to the uncommon array[] which are uncommon to both of them.

Thank you!

Upvotes: 0

Views: 149

Answers (4)

smyatkin_max
smyatkin_max

Reputation: 365

First of all, std::set does not have anything to do with hashing. Sets and maps are ordered containers. Implementations may differ, but most likely it is a binary search tree. Whatever you do, you wont get faster that nlogn with them - the same complexity as sorting. If you're fine with nlogn and sorting, I'd strongly advice just using set_symmetric_difference algorithm https://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference , it requires two sorted containers.

But if you insist on an implementation relying on hashing, you should use std::unordered_set or std::unordered_map. This way you can be faster than nlogn. You can get your answer in nm time, where n = a.size() and m = b.size(). You should create two unordered_set`s: hashed_a, hashed_b and in two loops check what elements from hashed_a are not in hashed_b, and what elements in hashed_b are not in hashed_a. Here a pseudocode:

create hashed_a and hashed_b
create set_result // for the result
for (a_v : hashed_a) 
  if (a_v not in hashed_b)
    set_result.insert(a_v)
for (b_v : hashed_b) 
  if (b_v not in hashed_a)
    set_result.insert(b_v)
return set_result // it holds the symmetric diference, which you need

UPDATE: as noted in the comments, my answer doesn't count for duplicates. The easiest way to modify it for duplicates would be to use unordered_map<int, int> with the keys for elements in the set and values for number of encounters.

Upvotes: 3

Mr.HITMAN
Mr.HITMAN

Reputation: 157

The Question can Be solved in O(nlogn) time-complexity.

ALGORITHM

  • Sort both array with merge sort in O(nlogn) complexity. You can also use sort-function. For example sort(array1.begin(),array1.end()).

  • Now use two pointer method to remove all common elements on both arrays.

  • Program of above Method

    int i = 0, j = 0; 
    while (i < array1.size() && j < array2.size()) { 
  
        // If not common, print smaller 
        if (array1[i] < array2[j]) { 
            cout << array1[i] << " "; 
            i++; 
        } 
        else if (array2[j] < array1[i]) { 
            cout << array2[j] << " ";  
            j++; 
        } 

        // Skip common element 
        else { 
            i++; 
            j++; 
        } 
    }
  • Complexity of above program is O(array1.size() + array2.size()). In worst case say O(2n)

  • The above program gives the uncommon elements as output. If you want to store them , just create a vector and push them into vector.

  • Original Problem LINK

Upvotes: 1

TohpiMou
TohpiMou

Reputation: 56

A solution without sorting (and without hashing but you seem to care more about complexity then the hashing itself) is to notice the following : an uncommon element e is an element that is in exactly one multiset.

This means that the multiset of all uncommon elements is the union between 2 multisets:

  1. S1 = The element in A that are not in B
  2. S2 = The element in B that are not in A

Using the std::set_difference, you get:

#include <set>
#include <vector>
#include <iostream>
#include <algorithm>
int main() {
    std::multiset<int> ms1{5,3,5,4,2};


    std::multiset<int> ms2{3,4,1,2,1};

    std::vector<int> v;
    std::set_difference( ms1.begin(), ms1.end(), ms2.begin(), ms2.end(), std::back_inserter(v));
    std::set_difference( ms2.begin(), ms2.end(), ms1.begin(), ms1.end(), std::back_inserter(v));

    for(int e : v)
        std::cout << e << ' ';
    
    return 0;
}

Output:

5 5 1 1 

The complexity of this code is 4.(N1+N2 -1) where N1 and N2 are the size of the multisets.

Links:

set_difference: https://en.cppreference.com/w/cpp/algorithm/set_difference

compiler explorer: https://godbolt.org/z/o3KGbf

Upvotes: 1

Hack06
Hack06

Reputation: 1092

First, you need to find a way to distinguish between the same values contained in the same array (for ex. 5 and 5 in the first array, and 1 and 1 in the second array). This is the key to reducing the overall complexity, otherwise you can't do better than O(nlogn). A good possible algorithm for this task is to create a wrapper object to hold your actual values, and put in your arrays pointers to those wrapper objects with actual data, so your pointer addresses will serve as a unique identifier for objects. This wrapping will cost you just O(n1+n2) operations, but also an additional O(n1+n2) space.

Now your problem is that you have in both arrays only elements unique to each of those arrays, and you want to find the uncommon elements. This means the (Union of both array elements) - (Intersection of both array elements). Therefore, all you need to do is to push all the elements of the first array into a hash-map (complexity O(n1)), and then start pushing all the elements of the second array into the same hash-map (complexity O(n2)), by detecting the collisions (equality of an element from first array with an element from the second array). This comparison step will require O(n2) comparisons in the worst case. So for the maximum performance optimization you could have checked the size of the arrays before starting pushing the elements into the hash-map, and swap the arrays so that the first push will take place with the longest array. Your overall algorithm complexity would be O(n1+n2) pushes (hashings) and O(n2) comparisons.

The implementation is the most boring stuff, so I let it to you ;)

Upvotes: 1

Related Questions