voichek
voichek

Reputation: 31

Storing std::vectors in google::dense_hash_map makes it slower

I need mapping from uint64->[list of uint64] (list length is constant per hash table).

I have tried using google::dense_hash_map in two ways:

1.google::dense_hash_map<uint64, std::array<uint64, 10>, Hash64> //10 for example

2.google::dense_hash_map<uint64, std::vector<uint64>, Hash64>

(1) works much faster (3-fold) than (2). The problem is, I don't want to define the size of array in compilation time but when defining the hash-map, is there another option possible?

Most of my operations are modifying the values in the hash-table, and I sometime erase all elements so I can use the same allocated memory.

Upvotes: 3

Views: 546

Answers (1)

Daniel Langr
Daniel Langr

Reputation: 23527

You can try to employ some memory pooling to avoid dynamic allocations and get more compact (cache-friendly) memory usage. This is a very simple examplary solution that works for me together with boost::dense_hash_map:

template <typename T>
class pooled_rt_array {
   public:
      static void init_pool(size_t capacity) {
         pool_.resize(capacity);
         top_ = pool_.data();
      }

      pooled_rt_array() : data_(top_), size_(0) { }

      pooled_rt_array(size_t size) : data_(top_), size_(size) {
         assert(top_ + size <= pool_.data() + pool_.size()); // not safe, in fact
         top_ += size;
      }

      T & operator[](size_t n) { return *(data_ + n); }
      const T & operator[](size_t n) const { return *(data_ + n); }
      size_t size() const { return size_; }

   private:
      T * data_;
      size_t size_;

      static std::vector<T> pool_;
      static T * top_;
};

template <typename T>
std::vector<T> pooled_rt_array<T>::pool_;

template <typename T>
T * pooled_rt_array<T>::top_;

A simple usage:

int main() {
    using value_t = pooled_rt_array<uint64_t>;
    google::hash_dense_map<uint64_t, value_t
       /* , std::hash<uint64_t>, std::equal_to<uint64_t> */
    > m;
    m.set_empty_key(0xFFFFFFFFFFFFFFFF);

    size_t n;
    std::cin >> n;
    value_t::init_pool(1000000 * n); // pool for 1000000 arrays
    value_t a(n);
    for (size_t i = 0; i < a.size(); i++) a[i] = i;
    m[0] = std::move(a);
}

I also did some benchmark and this was much faster than storing std::vector in the map.

If the size of arrays is the same for all of them, you can even remove the size_ member variable from pooled_rt_array, which will make its instances having the size of a single pointer.


Note that there are much more sophisticated ways how to employ memory pooling, such as those provided by Boost. You can for example use some special pool-aware allocators and use them with std::vector.

Upvotes: 1

Related Questions