Reputation: 10355
I have a very large std::vector of std::vectors containing a fixed number of unsigned integers.
All vectors of uints are sorted ascending.
My current way to eliminate duplicate vectors is
unsigned int i = 0;
while ( i < new_combs.size() )
{
unsigned int j = i + 1;
while ( j < new_combs.size() )
{
unsigned int k = 0;
while ( k < new_combs.at(i).size() && new_combs.at(i).at(k) == new_combs.at(j).at(k) )
++k;
if ( k == new_combs.at(j).size() )
new_combs.erase(new_combs.begin() + j);
else
++j;
}
++i;
}
here, new_combs is a vector containing vectors as mentioned above.
Is there a more efficient way to eliminate duplicates if the vector of vectors is unsorted?
Upvotes: 1
Views: 1828
Reputation: 23629
There are several elements in your code that ring my alarm bell regarding performance.
First, you are using vectors. Erasing elements from vectors is always slow. You might consider using a different container (std::list) or adjust your code so that you have a special value for nothing (e.g. zero or -1).
Second, you could use an std::set or std::unordered_set to keep values that you already encountered. That way, you only have to loop over your vectors once.
EDIT: Forget this answer. I misread the question and thought that duplicate values (not duplicate vectors) had to be removed.
Nevertheless, some reactions on the comments given:
Upvotes: 0
Reputation: 52157
I agree with Luchian Grigore's answer, but you might also consider converting the whole outer vector
to unordered_set
, which is an O(n) operation provided hashes of sub-vectors are not too lopsided (as opposed to average O(n*log(n)) for sorting). You can even use pointers to sub-vectors in your unordered_set
, to avoid unnecessary copying. This can be an important performance difference for large amounts of data.
This example illustrates the basic idea of using your own hash function and pointers (it deals with vector
of string
s and uses unordered_map
, not unordered_set
, but you should be able to modify it fairly easily to your needs).
Upvotes: 0
Reputation: 71009
Not much you can do if the vector is unsorted. If it is sorted however you can use the unique method defined in algorithm:
new_combs.erase(unique(new_combs.begin(), new_combs.end()), new_combs.end());
Upvotes: 2
Reputation: 36059
Asymptotically, your algorithm seems like the usual O(n) implementation and is therefore optimal. (Even though I didn't understand your diagonalization strategy with i
and j
and why you only erase, but never move elements. Your code is highly unclear.)
However, you are duplicating the STL, and a shorter version of a unique-ing loop is:
struct unique {
template <class C>
void operator()( C& c ) {
c.erase( std::unique( c.begin(), c.end() ), c.end() );
}
};
std::for_each( new_combs.begin(), new_combs.end(), unique() );
Upvotes: 0
Reputation: 4456
Have you considered using std::set? It is ordered and doesn't allow duplicates to begin with.
Upvotes: 3
Reputation: 258678
A shorter way would be using <algorithm>
:
std::sort(new_combs.begin(), new_combs.end());
new_combs.erase(std::unique(new_combs.begin(), new_combs.end()), new_combs.end());
Unless you specifically need a std::vector
, you can use std::set
to avoid duplicates.
Upvotes: 9