I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates. It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions. Keep in mind the items to store are simple integer. There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this. Some pros/cons that I've gathered: vector (PRO): Contigous memory, a huge factor. (PRO): Memory can be reserved first, very few allocations/deallocations afterwards (CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache set (PRO): Simple, clearly meant for this (CON): Not constant insert-time (CON): Alot of allocations/deallocations per frame (CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory. unordered_set (PRO): Simple, clearly meant for this (PRO): Average case constant time insert (CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else (CON): Alot of allocations/deallocations per frame (CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory. I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set. I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell? EDIT: As I didnt mention, the container is cleared() at the end of each frame

Reputation: 9240

What container to store unique values?

I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.

It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.

Keep in mind the items to store are simple integer.

There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.

Some pros/cons that I've gathered:

vector

(PRO): Contigous memory, a huge factor.
(PRO): Memory can be reserved first, very few allocations/deallocations afterwards
(CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache

set

(PRO): Simple, clearly meant for this
(CON): Not constant insert-time
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.

unordered_set

(PRO): Simple, clearly meant for this
(PRO): Average case constant time insert
(CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else
(CON): Alot of allocations/deallocations per frame
(CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.

I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.

I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?

EDIT: As I didnt mention, the container is cleared() at the end of each frame

Upvotes: 21

Answers (3)

Vaughn Cato

Reputation: 64308

I did timing with a few different methods that I thought were likely candidates. Using std::unordered_set was the winner.

Here are my results:

Using UnorderedSet:    0.078s
Using UnsortedVector:  0.193s
Using OrderedSet:      0.278s
Using SortedVector:    0.282s

Timing is based on the median of five runs for each case.

compiler: gcc version 4.9.1
flags:    -std=c++11 -O2
OS:       ubuntu 4.9.1
CPU:      Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz

Code:

#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <random>
#include <set>
#include <unordered_set>
#include <vector>

using std::cerr;
static const size_t n_distinct = 100;

template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
  auto distribution = std::uniform_int_distribution<int>(0,n_distinct);
  auto generator = [&]{return distribution(engine);};
  auto vec = std::vector<int>();
  std::generate_n(std::back_inserter(vec),n,generator);
  return vec;
}


struct UnsortedVectorSmallSet {
  std::vector<int> values;
  static const char *name() { return "UnsortedVector"; }
  UnsortedVectorSmallSet() { values.reserve(n_distinct); }

  void insert(int new_value)
  {
    auto iter = std::find(values.begin(),values.end(),new_value);
    if (iter!=values.end()) return;
    values.push_back(new_value);
  }
};


struct SortedVectorSmallSet {
  std::vector<int> values;
  static const char *name() { return "SortedVector"; }
  SortedVectorSmallSet() { values.reserve(n_distinct); }

  void insert(int new_value)
  {
    auto iter = std::lower_bound(values.begin(),values.end(),new_value);
    if (iter==values.end()) {
      values.push_back(new_value);
      return;
    }
    if (*iter==new_value) return;
    values.insert(iter,new_value);
  }
};

struct OrderedSetSmallSet {
  std::set<int> values;
  static const char *name() { return "OrderedSet"; }
  void insert(int new_value) { values.insert(new_value); }
};

struct UnorderedSetSmallSet {
  std::unordered_set<int> values;
  static const char *name() { return "UnorderedSet"; }
  void insert(int new_value) { values.insert(new_value); }
};



int main()
{
  //using SmallSet = UnsortedVectorSmallSet;
  //using SmallSet = SortedVectorSmallSet;
  //using SmallSet = OrderedSetSmallSet;
  using SmallSet = UnorderedSetSmallSet;

  auto engine = std::default_random_engine();

  std::vector<int> values_to_insert = randomInts(engine,10000000);
  SmallSet small_set;
  namespace chrono = std::chrono;
  using chrono::system_clock;
  auto start_time = system_clock::now();
  for (auto value : values_to_insert) {
    small_set.insert(value);
  }
  auto end_time = system_clock::now();
  auto& result = small_set.values;

  auto sum = std::accumulate(result.begin(),result.end(),0u);
  auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();

  cerr << "Using " << SmallSet::name() << ":\n";
  cerr << "  sum=" << sum << "\n";
  cerr << "  elapsed: " << elapsed_seconds << "s\n";
}

Upvotes: 15

Richard Hodges

Reputation: 69884

I'm going to put my neck on the block here and suggest that the vector route is probably most efficient when the size is 100 and the objects being stored are integral values. The simple reason for this is that set and unordered_set allocate memory for each insert whereas the vector needn't more than once.

You can increase search performance dramatically by keeping the vector ordered, since then all searches can be binary searches and therefore complete in log2N time.

The downside is that the inserts will take a tiny fraction longer due to the memory moves, but it sounds as if there will be many more searches than inserts, and moving (average) 50 contiguous memory words is an almost instantaneous operation.

Final word: Write the correct logic now. Worry about performance when the users are complaining.

EDIT: Because I couldn't help myself, here's a reasonably complete implementation:

template<typename T>
struct vector_set
{
    using vec_type = std::vector<T>;
    using const_iterator = typename vec_type::const_iterator;
    using iterator = typename vec_type::iterator;

    vector_set(size_t max_size)
    : _max_size { max_size }
    {
        _v.reserve(_max_size);
    }

    /// @returns: pair of iterator, bool
    /// If the value has been inserted, the bool will be true
    /// the iterator will point to the value, or end if it wasn't
    /// inserted due to space exhaustion
    auto insert(const T& elem)
    -> std::pair<iterator, bool>
    {
        if (_v.size() < _max_size) {
            auto it = std::lower_bound(_v.begin(), _v.end(), elem);
            if (_v.end() == it || *it != elem) {
                return make_pair(_v.insert(it, elem), true);
            }
            return make_pair(it, false);
        }
        else {
            return make_pair(_v.end(), false);
        }
    }

    auto find(const T& elem) const
    -> const_iterator
    {
        auto vend = _v.end();
        auto it = std::lower_bound(_v.begin(), vend, elem);
        if (it != vend && *it != elem)
            it = vend;
        return it;
    }

    bool contains(const T& elem) const {
        return find(elem) != _v.end();
    }

    const_iterator begin() const {
        return _v.begin();
    }

    const_iterator end() const {
        return _v.end();
    }


private:
    vec_type _v;
    size_t _max_size;
};

using namespace std;


BOOST_AUTO_TEST_CASE(play_unique_vector)
{
    vector_set<int> v(100);

    for (size_t i = 0 ; i < 1000000 ; ++i) {
        v.insert(int(random() % 200));
    }

    cout << "unique integers:" << endl;
    copy(begin(v), end(v), ostream_iterator<int>(cout, ","));
    cout << endl;

    cout << "contains 100: " << v.contains(100) << endl;
    cout << "contains 101: " << v.contains(101) << endl;
    cout << "contains 102: " << v.contains(102) << endl;
    cout << "contains 103: " << v.contains(103) << endl;
}

Upvotes: 9

qdii

Reputation: 12963

As you said you have many insertions and only one traversal, I’d suggest to use a vector and push the elements in regardless of whether they are unique in the vector. This is done in O(1).

Just when you need to go through the vector, then sort it and remove the duplicate elements. I believe this can be done in O(n) as they are bounded integers.

EDIT: Sorting in linear time through counting sort presented in this video. If not feasible, then you are back to O(n lg(n)).

You will have very little cache miss because of the contiguity of the vector in memory, and very few allocations (especially if you reserve enough memory in the vector).

Upvotes: 2

What container to store unique values?

Answers (3)

Related Questions