user963241
user963241

Reputation: 7038

Use a std::vector or std::set for sorted and no duplicate elements?

I want a container for holding about a ~3K to 5K int type elements. I want them to be sorted and there should be no duplicates. However, I am not using std::find to find an element in them. I am simply using an iterator to sequentially loop over them.

Would it be better use a std::set here? Or, a vector to first insert all elements, then remove duplicates and sort them?

What's the efficient way to sort and remove duplicates from std::vector if it should be preferred over a std::set?

Upvotes: 1

Views: 1028

Answers (3)

JDługosz
JDługosz

Reputation: 5642

std::set is slow and uses a lot of memory, because it is a dynamic data structure with every node containing pointers.

Get the best of both worlds by using Boost's flat_set. It stores in a sorted vector, but contains the API like set so it automatically removes duplicates and keeps them in sorted order.

I think you'll find that a contiguous vector (as used by boost::containers::flat_set) will be faster than the std::set up to 10,000 elements at least (rule of thumb; modify if your elements are slow to move). You can improve it by using reserve before you start.

std::flag_set is adopted by c++ 23

Upvotes: 3

3CxEZiVlQ
3CxEZiVlQ

Reputation: 38325

The efficient way is sort + unique, that takes O(N * log N) time complexity and O(1) memory usage. sort can be omitted if the vector is already sorted, then the time complexity is O(N).

You find the description and usage examples here std::unique. It must be used after std::sort if the vector is not yet sorted.

Upvotes: 0

Aykhan Hagverdili
Aykhan Hagverdili

Reputation: 29965

You should run a couple benchmarks to see which is better in your case. I did one myself:

#include <vector>
#include <set>
#include <random>
#include <limits.h>

std::vector<int> GetRandArray(std::size_t const sz)
{
  std::mt19937 engine {std::random_device{}()};
  std::uniform_int_distribution<int> dist {INT_MIN, INT_MAX};
  auto gen = [&dist, &engine]  { return dist(engine); };

  std::vector<int> vec(sz);
  generate(begin(vec), end(vec), gen);
  return vec;
}

auto const RandNumbers = GetRandArray(3'000);

static void Vector(benchmark::State& state) {
  for (auto _ : state) {
    std::vector<int> vec{ RandNumbers.begin(), RandNumbers.end() };
    std::sort(vec.begin(), vec.end());
    vec.erase( std::unique(vec.begin(), vec.end()), vec.end() );
    benchmark::DoNotOptimize(vec);
  }
}
BENCHMARK(Vector);

static void Set(benchmark::State& state) {
  for (auto _ : state) {
    std::set<int> set{ RandNumbers.begin(), RandNumbers.end() };
    benchmark::DoNotOptimize(set);
  }
}
BENCHMARK(Set);

enter image description here

As you can see, with 3000 elements, vector obliterates set.

Upvotes: 1

Related Questions