Ironwing
Ironwing

Reputation: 131

Removing duplicates in two vectors

I'll try to explain my problem with the following example:

vector<pair<string, string>> a = { { "A","1" }, {"B","2" },{ "C","3" },{ "D","3" },{ "E","5" } };

vector<pair<string, string>> b = { { "A","1" },{ "B","3" },{ "D","3" },{ "E","4" },{ "Z","5" } };

What will be the most efficient way to erase duplicates and get the output into the same vectors? The number of pairs is pretty large, lets say about 100 000.

Both vectors are sorted by the first element.

vector<pair<string, string>> a = { { "B","2" },{ "C","3" },{ "E","5" } };

vector<pair<string, string>> b = { { "B","3" },{ "E","4" },{ "Z","5" } };

The thing is, I need to compare this vectors after removing duplicates. First element in the pair is the file path, and second is the checksum for it. So for example if i have "B","2" in the first container, and "B","3" is the second, i can list this file as "modified". I am open to use std::set if this would make this problem easier.

Upvotes: 2

Views: 2836

Answers (3)

gchen
gchen

Reputation: 1173

Using running indices will give you O(len(a) + len(b)) time complexity and O(1) additional space (Given a and b are already sorted)

void removeDuplicate(vector<pair<string, string>>& a, vector<pair<string, string>>& b) {

    //Add these two lines if there can be duplicates in a or b themselves.
    //a.erase(std::unique(a.begin(), a.end()), a.end());
    //b.erase(std::unique(b.begin(), b.end()), b.end());

    size_t i = 0;
    size_t j = 0;

    size_t p1 = 0;
    size_t p2 = 0;

    while(i < a.size() && j < b.size()) {
        if(a[i] == b[j]) {
            i++;
            j++;
        } else if (a[i] > b[j]) {
            b[p2++] = b[j++]; 
        } else if (b[j] > a[i]) {
            a[p1++] = a[i++];
        }
    }

    while(i < a.size()) {
        a[p1++] = a[i++]; 
    }

    while(j < b.size()) {
        b[p2++] = b[j++];
    }

    a.erase(a.begin()+p1, a.end());
    b.erase(b.begin()+p2, b.end());
}

Upvotes: 6

Caleth
Caleth

Reputation: 62626

I don't think any of the standard library algorithms will be directly helpful here.

We first check to see if we should remove (from both), otherwise we advance the iterator pointing to the lesser value and continue.

for (auto ait = a.begin(), bit = b.begin(); ait != a.end() && bit != b.end();)
{
    if (*ait == *bit)
    {
        // Potenitally multiple duplicate values
        ait = a.erase(std::remove(ait, a.end(), *ait), a.end());
        bit = b.erase(std::remove(bit, b.end(), *bit), b.end());
    }
    else if (*ait < *bit) ++ait;
    else ++bit;
}

Upvotes: 2

Alexey Usachov
Alexey Usachov

Reputation: 1374

You can use some algorithms from STL library to help solve this task. Firstly find same elements and put them in temporary vector, then remove these elements from each vector, see code example:

vector<pair<string, string>> a = { { "A","1" }, {"B","2" },{ "C","3" },{ "D","3" },{ "E","5" } };
vector<pair<string, string>> b = { { "A","1" },{ "B","3" },{ "D","3" },{ "E","4" },{ "Z","5" } };

//Vector to hold same elements
vector<pair<string, string>> same_elements {};

//Fill same_elements vector
std::for_each(a.begin(), a.end(), [&same_elements, b]( pair<string, string>& el )
              {
                  if( find(b.begin(), b.end(), el) != b.end() )
                  {
                      same_elements.push_back(el);
                  }
              });

//Remove same elements from a and b
std::for_each(same_elements.begin(), same_elements.end(), [&a, &b]( pair<string, string>& el_to_delete )
              {
                  auto It_a = find(a.begin(), a.end(), el_to_delete);
                  if( It_a != a.end() )
                  {
                      a.erase(It_a);
                  }

                  auto It_b = find(b.begin(), b.end(), el_to_delete);
                  if( It_b != b.end() )
                  {
                      b.erase(It_b);
                  }
              });

I used std::for_each to iterate over every elements of vector, std::find to find needed elements in vectors and erase vector method to erase same element from vector by iterator.

Upvotes: 0

Related Questions