Reputation: 131
I'll try to explain my problem with the following example:
vector<pair<string, string>> a = { { "A","1" }, {"B","2" },{ "C","3" },{ "D","3" },{ "E","5" } };
vector<pair<string, string>> b = { { "A","1" },{ "B","3" },{ "D","3" },{ "E","4" },{ "Z","5" } };
What will be the most efficient way to erase duplicates and get the output into the same vectors? The number of pairs is pretty large, lets say about 100 000.
Both vectors are sorted by the first element.
vector<pair<string, string>> a = { { "B","2" },{ "C","3" },{ "E","5" } };
vector<pair<string, string>> b = { { "B","3" },{ "E","4" },{ "Z","5" } };
The thing is, I need to compare this vectors after removing duplicates.
First element in the pair is the file path, and second is the checksum for it.
So for example if i have "B","2"
in the first container, and "B","3"
is the second, i can list this file as "modified". I am open to use std::set
if this would make this problem easier.
Upvotes: 2
Views: 2836
Reputation: 1173
Using running indices will give you O(len(a) + len(b)) time complexity and O(1) additional space (Given a
and b
are already sorted)
void removeDuplicate(vector<pair<string, string>>& a, vector<pair<string, string>>& b) {
//Add these two lines if there can be duplicates in a or b themselves.
//a.erase(std::unique(a.begin(), a.end()), a.end());
//b.erase(std::unique(b.begin(), b.end()), b.end());
size_t i = 0;
size_t j = 0;
size_t p1 = 0;
size_t p2 = 0;
while(i < a.size() && j < b.size()) {
if(a[i] == b[j]) {
i++;
j++;
} else if (a[i] > b[j]) {
b[p2++] = b[j++];
} else if (b[j] > a[i]) {
a[p1++] = a[i++];
}
}
while(i < a.size()) {
a[p1++] = a[i++];
}
while(j < b.size()) {
b[p2++] = b[j++];
}
a.erase(a.begin()+p1, a.end());
b.erase(b.begin()+p2, b.end());
}
Upvotes: 6
Reputation: 62626
I don't think any of the standard library algorithms will be directly helpful here.
We first check to see if we should remove (from both), otherwise we advance the iterator pointing to the lesser value and continue.
for (auto ait = a.begin(), bit = b.begin(); ait != a.end() && bit != b.end();)
{
if (*ait == *bit)
{
// Potenitally multiple duplicate values
ait = a.erase(std::remove(ait, a.end(), *ait), a.end());
bit = b.erase(std::remove(bit, b.end(), *bit), b.end());
}
else if (*ait < *bit) ++ait;
else ++bit;
}
Upvotes: 2
Reputation: 1374
You can use some algorithms from STL library to help solve this task. Firstly find same elements and put them in temporary vector, then remove these elements from each vector, see code example:
vector<pair<string, string>> a = { { "A","1" }, {"B","2" },{ "C","3" },{ "D","3" },{ "E","5" } };
vector<pair<string, string>> b = { { "A","1" },{ "B","3" },{ "D","3" },{ "E","4" },{ "Z","5" } };
//Vector to hold same elements
vector<pair<string, string>> same_elements {};
//Fill same_elements vector
std::for_each(a.begin(), a.end(), [&same_elements, b]( pair<string, string>& el )
{
if( find(b.begin(), b.end(), el) != b.end() )
{
same_elements.push_back(el);
}
});
//Remove same elements from a and b
std::for_each(same_elements.begin(), same_elements.end(), [&a, &b]( pair<string, string>& el_to_delete )
{
auto It_a = find(a.begin(), a.end(), el_to_delete);
if( It_a != a.end() )
{
a.erase(It_a);
}
auto It_b = find(b.begin(), b.end(), el_to_delete);
if( It_b != b.end() )
{
b.erase(It_b);
}
});
I used std::for_each
to iterate over every elements of vector, std::find
to find needed elements in vectors and erase
vector method to erase same element from vector by iterator.
Upvotes: 0