Reputation: 75
I am trying to remove duplicates from an vector by using an unordered_set. but my design creates an unordered_set that doesn't maintain the ordering correctly. in this example the "z" is not at the end. What am I doing wrong? thank you in advance.
EDIT: sorry if I wasn't clear with what I was looking for. I want the output to be "e,d,a,b,c,z" I want to keep original ordering but remove duplicates. I currently have it working using about 3 different for loops and an extra copy of the init vector. I was just looking for a STL function that was cleaner if possible.
output produced: e d a b c a a a a b b b b c z printing unordered set e d a z b c
#include <iostream>
#include <iterator>
#include <algorithm>
#include <string>
#include <unordered_set>
using namespace std;
int main() {
vector<string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for (vector<string>::iterator it = terminals.begin(); it != terminals.end(); it++) // print given vector
cout << *it << " ";
cout << endl;
unordered_set<string> newSet;
copy(terminals.begin(), terminals.end(), inserter(newSet, newSet.end()));
cout << "printing unordered set" << endl;
for (unordered_set<string>::iterator it = newSet.begin(); it != newSet.end(); it++)
cout << *it << " ";
cout << endl;
//system("pause");
return 0;
}
Upvotes: 3
Views: 442
Reputation: 117288
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.
If you need the unique terminals to be ordered, use a std::set:
#include <iostream>
#include <vector>
#include <string>
#include <set>
int main() {
std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for(const std::string& terminal : terminals) // print given vector
std::cout << terminal << " ";
std::cout << "\n";;
// populate the set directly from the vectors iterators:
std::set<std::string> newSet(terminals.begin(), terminals.end());;
std::cout << "printing the (ordered) set:" << "\n";;
for(const std::string& terminal : newSet)
std::cout << terminal << " ";
std::cout << "\n";;
}
If you want to maintain the original order, you can't use either set as your final storage, but you can use a std::unordered_set
as a cache/blacklist for values you've already inserted in your final storage.
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <unordered_set>
int main() {
std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for(const std::string& terminal : terminals) // print given vector
std::cout << terminal << " ";
std::cout << "\n";;
std::vector<std::string> newSet; // not really a set anymore
std::unordered_set<std::string> cache; // blacklist
// try to insert all terminals and only when an insert is successful,
// put the terminal in newSet
std::for_each(terminals.begin(), terminals.end(),
[&](const std::string& terminal) {
auto [it, inserted] = cache.insert(terminal);
if(inserted)
newSet.push_back(terminal);
}
);
std::cout << "printing the vector of unique terminals:" << "\n";;
for(const std::string& terminal : newSet)
std::cout << terminal << " ";
std::cout << "\n";;
}
If you want the original order and don't mind making changes directly to the original terminals
vector, you can use std::remove_if
combined with an unordered_set
which is nice since it doesn't require a new vector. This is an annotated variant of @Marek R's answer:
Read this first: Erase–remove idiom
int main() {
std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
for(const std::string& terminal : terminals) // print given vector
std::cout << terminal << " ";
std::cout << "\n";;
std::unordered_set<std::string> cache; // blacklist
// remove_if() moves all entries in your container, for which the
// UnaryPredicate(*) returns true, to the end of the container. It returns
// an iterator pointing to the first element in the vector that was
// moved - which is a suitable starting point for a subsequent erase().
//
// (*) UnaryPredicate: A callable that returns true or false given a single
// value.
// auto past_new_end = std::vector<std::string>::iterator past_new_end
auto past_new_end = std::remove_if(terminals.begin(), terminals.end(),
// this lambda is the UnaryPredicate
[&](const std::string& terminal) {
// insert returns a std::pair<Iterator, bool>
// where the bool (.second in the pair) is false
// if the value was not inserted (=it was already present)
return cache.insert(terminal).second == false;
}
);
std::cout << "display all the entries (now with unspecified values) "
"that will be erased:\n";
std::copy(past_new_end, terminals.end(),
std::ostream_iterator<std::string>(std::cout, "<"));
std::cout << "\n";
// erase all the moved entries
terminals.erase(past_new_end, terminals.end());
std::cout << "printing the unique terminals:" << "\n";;
for(const std::string& terminal : terminals)
std::cout << terminal << " ";
std::cout << "\n";;
}
Upvotes: 5
Reputation: 37607
I am trying to remove duplicates from an vector by using an unordered_set.
Why do you assumed that unordered_set
conserves any kind of order?
Name clearly states that there is no any particular order.
You should use unordered_set
only to keep track if item was already found in sequence. Based on that you can remove item from sequence, so this should go like this:
void removeDuplicates(Data &data)
{
std::unordered_set<std::string> foundItems;
auto newEnd = std::remove_if(data.begin(), data.end(), [&foundItems](const auto &s)
{
return !foundItems.insert(s).second;
});
data.erase(newEnd, data.end());
}
https://wandbox.org/permlink/T24UfiLQep0XUQhQ
Upvotes: 0
Reputation: 104
Looks like you want to use a (ordered) set.
Edit : Looks like you don't, actually. A std::vector
could work, but it's probably not the cleanest workaround.
Upvotes: 2
Reputation: 711
You could also use an unordered map and then store the item as as key to the map and the index as the corresponding value to that key.
Upvotes: 0
Reputation: 490108
If you want to maintain the original order, but enforce uniqueness, you probably want to:
If you want the outputs ordered (so, in you example, the output would be "a b c d e z"), then you could either insert the items in an std::set
, or else you could use std::sort
followed by std::unique
to get exactly one of each unique element in the input.
Upvotes: 2