Reputation: 7038
I want a container for holding about a ~3K to 5K int
type elements. I want them to be sorted and there should be no duplicates. However, I am not using std::find
to find an element in them. I am simply using an iterator to sequentially loop over them.
Would it be better use a std::set
here? Or, a vector to first insert all elements, then remove duplicates and sort them?
What's the efficient way to sort and remove duplicates from std::vector
if it should be preferred over a std::set
?
Upvotes: 1
Views: 1028
Reputation: 5642
std::set
is slow and uses a lot of memory, because it is a dynamic data structure with every node containing pointers.
Get the best of both worlds by using Boost's flat_set
. It stores in a sorted vector, but contains the API like set
so it automatically removes duplicates and keeps them in sorted order.
I think you'll find that a contiguous vector
(as used by boost::containers::flat_set
) will be faster than the std::set
up to 10,000 elements at least (rule of thumb; modify if your elements are slow to move). You can improve it by using reserve
before you start.
std::flag_set
is adopted by c++ 23
Upvotes: 3
Reputation: 38325
The efficient way is sort + unique, that takes O(N * log N) time complexity and O(1) memory usage. sort can be omitted if the vector is already sorted, then the time complexity is O(N).
You find the description and usage examples here std::unique. It must be used after std::sort if the vector is not yet sorted.
Upvotes: 0
Reputation: 29965
You should run a couple benchmarks to see which is better in your case. I did one myself:
#include <vector>
#include <set>
#include <random>
#include <limits.h>
std::vector<int> GetRandArray(std::size_t const sz)
{
std::mt19937 engine {std::random_device{}()};
std::uniform_int_distribution<int> dist {INT_MIN, INT_MAX};
auto gen = [&dist, &engine] { return dist(engine); };
std::vector<int> vec(sz);
generate(begin(vec), end(vec), gen);
return vec;
}
auto const RandNumbers = GetRandArray(3'000);
static void Vector(benchmark::State& state) {
for (auto _ : state) {
std::vector<int> vec{ RandNumbers.begin(), RandNumbers.end() };
std::sort(vec.begin(), vec.end());
vec.erase( std::unique(vec.begin(), vec.end()), vec.end() );
benchmark::DoNotOptimize(vec);
}
}
BENCHMARK(Vector);
static void Set(benchmark::State& state) {
for (auto _ : state) {
std::set<int> set{ RandNumbers.begin(), RandNumbers.end() };
benchmark::DoNotOptimize(set);
}
}
BENCHMARK(Set);
As you can see, with 3000 elements, vector obliterates set.
Upvotes: 1