Reputation: 2335
I have a vector of strings:
std::vector<std::string> fName
which holds a list of file names <a,b,c,d,a,e,e,d,b>
.
I want to get rid of all the files that have duplicates and want to retain only the files that do not have duplicates in the vector.
for(size_t l = 0; l < fName.size(); l++)
{
strFile = fName.at(l);
for(size_t k = 1; k < fName.size(); k++)
{
strFile2 = fName.at(k);
if(strFile.compare(strFile2) == 0)
{
fName.erase(fName.begin() + l);
fName.erase(fName.begin() + k);
}
}
}
This is removing a few of the duplicate but still has a few duplicates left, need help in debugging.
Also my input looks like <a,b,c,d,e,e,d,c,a>
and my expected output is <b>
as all other files b,c,d,e have duplicates they are removed.
Upvotes: 15
Views: 25050
Reputation: 58677
You can eliminate duplicates in O(log n) runtime and O(n) space:
std::set<std::string> const uniques(vec.begin(), vec.end());
vec.assign(uniques.begin(), uniques.end());
But the O(log n) runtime is a bit misleading, because the O(n) space actually does O(n) dynamic allocations, which are expensive in terms of speed. The elements must also be comparable (here with operator<()
, which std::string
supports as a lexicographical compare).
If you want to store only unique elements:
template<typename In>
In find_unique(In first, In last)
{
if( first == last ) return last;
In tail(first++);
int dupes = 0;
while( first != last ) {
if( *tail++ == *first++ ) ++dupes;
else if( dupes != 0 ) dupes = 0;
else return --tail;
}
return dupes == 0 ? tail : last;
}
The algorithm above takes a sorted range and returns the first unique element, in linear time and constant space. To get all uniques in a container you might use it like so:
auto pivot = vec.begin();
for( auto i(find_unique(vec.begin(), vec.end()));
i != vec.end();
i = find_unique(++i, vec.end())) {
std::iter_swap(pivot++, i);
}
vec.erase(pivot, vec.end());
Upvotes: 0
Reputation: 4962
#include <algorithm>
template <typename T>
void remove_duplicates(std::vector<T>& vec)
{
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}
Note: this require that T has operator<
and operator==
defined.
Why it work?
std::sort sort the elements using their less-than comparison operator
std::unique removes the duplicate consecutive elements, comparing them using their equal comparison operator
What if i want only the unique elements?
Then you better use std::map
#include <algorithm>
#include <map>
template <typename T>
void unique_elements(std::vector<T>& vec)
{
std::map<T, int> m;
for(auto p : vec) ++m[p];
vec.erase(transform_if(m.begin(), m.end(), vec.begin(),
[](std::pair<T,int> const& p) {return p.first;},
[](std::pair<T,int> const& p) {return p.second==1;}),
vec.end());
}
See: here.
Upvotes: 26
Reputation: 97948
#include <algorithms>
template <typename T>
remove_duplicates(std::vector<T>& vec)
{
std::vector<T> tvec;
uint32_t size = vec.size();
for (uint32_t i; i < size; i++) {
if (std::find(vec.begin() + i + 1, vec.end(), vec[i]) == vector.end()) {
tvec.push_back(t);
} else {
vec.push_back(t);
}
vec = tvec; // : )
}
}
Upvotes: 2
Reputation: 103713
If I understand your requirements correctly, and I'm not entirely sure that I do. You want to only keep the elements in your vector of which do not repeat, correct?
Make a map of strings to ints, used for counting occurrences of each string. Clear the vector, then copy back only the strings that only occurred once.
map<string,int> m;
for (auto & i : v)
m[i]++;
v.clear();
for (auto & i : m)
if(i.second == 1)
v.push_back(i.first);
Or, for the compiler-feature challenged:
map<string,int> m;
for (vector<string>::iterator i=v.begin(); i!=v.end(); ++i)
m[*i]++;
v.clear();
for (map<string,int>::iterator i=m.begin(); i!=m.end(); ++i)
if (i->second == 1)
v.push_back(i->first);
Upvotes: 4