yui
yui

Reputation: 41

unordered_map vs vector with int keys

What is the difference in performance between using an std::unordered_map or a std::vector when the keys are integers. I have around 100-1000 elements which have continuous IDs that I can use to access a vector. The reason to use a hash table is that is more convenient to index by the objects themselves.

Imagine three different situations:

Note that I ask it as a general question, no as a code specific one.

Upvotes: 2

Views: 2930

Answers (4)

Mooing Duck
Mooing Duck

Reputation: 66991

vector vs map have critical differences in how they treat keys. Ignore performance, and use whichever one has the right behavior for keys when a node is inserted or deleted in the middle.

In general, if you're mapping ids to values, the right behavior is that when a node is deleted, then other items keys should not change, and therefore a map is the best choice. It is the least surprising container, and contains the least surprising behavior for this task, and developers will therefore generate the fewest number of bugs in surrounding code.

If you have less than ~1000 items, then performance doesn't matter. If, however, you've measured that this code is actually affecting performance, and you're also 100% certain that values are never inserted or removed while the "map" is alive, then std::vector is actually far faster to be created, initialized, and used. However, if you use a std::vector, then developers are likely to make incorrect assumptions in surrounding code, and the code will likely eventually generate bugs, so you definitely only want to make this switch if it measurably makes a big difference in app performance. It doesn't matter how fast your app is, if it's wrong.

Upvotes: 0

cero
cero

Reputation: 170

I just wondered about the same question. Although I generally agree that one should in most cases use the most convenient container and also that, when in doubt, you should measure yourself, I think it is nice to have at least a general idea of the dimensions we're talking about.

Hence, I implemented a small example:

#include <string>
#include <iostream>
#include <vector>
#include <unordered_map>
#include <chrono>

struct MyStruct {
    int value;
};


int main(int argc, char** argv) {
    size_t const SIZE = 100;
    size_t const ACCESSES = 10000;

    std::vector<MyStruct const*> my_structs;
    for (size_t i = 0; i < SIZE; ++i) {
        my_structs.push_back(new MyStruct());
    }
    
    std::vector<double> values_vec(my_structs.size());
    std::unordered_map<MyStruct const*, double> values_map;
    for (size_t i = 0; i < SIZE; ++i) {
        double rand_val = (rand() % 1000)/10;

        values_vec[i] = rand_val;
        values_map[my_structs[i]] = rand_val;
    }

    std::vector<MyStruct const*> queries_ptr;
    std::vector<size_t> queries_int;
    for (size_t i = 0; i < ACCESSES; ++i) {
        size_t idx = rand() % SIZE;
        queries_int.push_back(idx);
        queries_ptr.push_back(my_structs[idx]);
    }

    auto begin_vec = std::chrono::steady_clock::now();
    double large_sum_vec = 0;
    for (size_t query : queries_int) {
        large_sum_vec += values_vec[query];
    }
    auto end_vec = std::chrono::steady_clock::now();

    double large_sum_map = 0;
    for (MyStruct const* query : queries_ptr) {
        large_sum_map += values_map[query];
    }
    auto end_map = std::chrono::steady_clock::now();

    std::cout << "Results for " << ACCESSES << " accesses to vector of size " << SIZE << std::endl;
    std::cout << "=== VEC === Result = " << large_sum_vec << " Time = " << std::chrono::duration_cast<std::chrono::microseconds>(end_vec - begin_vec).count() << " microseconds" << std::endl;
    std::cout << "=== MAP === Result = " << large_sum_vec << " Time = " << std::chrono::duration_cast<std::chrono::microseconds>(end_map - end_vec).count() << " microseconds" << std::endl;

    for (size_t i = 0; i < SIZE; ++i) {
        delete my_structs[i];
    }
}

You can find it here on Coliru: https://coliru.stacked-crooked.com/a/a986dd2607a8566a

The result: std::vector with indices is about 10-20 times faster than std::unordered_map for random read-access

Interestingly, both running times are almost independent of the data size. With increasing numbers of accesses the relation shifts more towards 1:20. I did not test write-intensive / mixed, but I don't expect that there is much of a difference, since the element is already found at this point.

Upvotes: 0

eerorika
eerorika

Reputation: 238471

Generally:

Vector has better cache coherence due to lack of indirection. Accessing an index is faster because there is no need to calculate a hash function. Iteration has predictable branches.

Unordered map uses less memory with sparse structures (you said that indices are continuous, so this advantage does not apply to you). Adding or removing elements in arbitrary indices is asymptotically faster with unordered map.

Asymptotic complexity doesn't necessarily matter when you have only very few elements such as 100-1000. Cache coherency and branch prediction tend to dominate in this case.

First pick which ever data structure is more convenient. Then measure if accessing that structure has significant impact on performance of the program as a whole. If it does, then measure the difference with the other data structure to see if it is significantly faster (in relation to variance of measurement).

Upvotes: 4

darune
darune

Reputation: 11146

Use the most convenient container for the task

Generally speaking.

If you have something like 100-1000 elements, the container doesn't really matter in itself - using a std::map even is better than std::unordered_map for example if you ever need to debug print the content - unless you somehow rely on the hash. It's when you get to something like 100k+ elements that container performance starts to get interesting.

Upvotes: 0

Related Questions