Reputation: 41
What is the difference in performance between using an std::unordered_map
or a std::vector
when the keys are integers. I have around 100-1000 elements which have continuous IDs that I can use to access a vector. The reason to use a hash table is that is more convenient to index by the objects themselves.
Imagine three different situations:
Note that I ask it as a general question, no as a code specific one.
Upvotes: 2
Views: 2930
Reputation: 66991
vector
vs map
have critical differences in how they treat keys. Ignore performance, and use whichever one has the right behavior for keys when a node is inserted or deleted in the middle.
In general, if you're mapping ids to values, the right behavior is that when a node is deleted, then other items keys should not change, and therefore a map
is the best choice. It is the least surprising container, and contains the least surprising behavior for this task, and developers will therefore generate the fewest number of bugs in surrounding code.
If you have less than ~1000 items, then performance doesn't matter. If, however, you've measured that this code is actually affecting performance, and you're also 100% certain that values are never inserted or removed while the "map" is alive, then std::vector
is actually far faster to be created, initialized, and used. However, if you use a std::vector
, then developers are likely to make incorrect assumptions in surrounding code, and the code will likely eventually generate bugs, so you definitely only want to make this switch if it measurably makes a big difference in app performance. It doesn't matter how fast your app is, if it's wrong.
Upvotes: 0
Reputation: 170
I just wondered about the same question. Although I generally agree that one should in most cases use the most convenient container and also that, when in doubt, you should measure yourself, I think it is nice to have at least a general idea of the dimensions we're talking about.
Hence, I implemented a small example:
#include <string>
#include <iostream>
#include <vector>
#include <unordered_map>
#include <chrono>
struct MyStruct {
int value;
};
int main(int argc, char** argv) {
size_t const SIZE = 100;
size_t const ACCESSES = 10000;
std::vector<MyStruct const*> my_structs;
for (size_t i = 0; i < SIZE; ++i) {
my_structs.push_back(new MyStruct());
}
std::vector<double> values_vec(my_structs.size());
std::unordered_map<MyStruct const*, double> values_map;
for (size_t i = 0; i < SIZE; ++i) {
double rand_val = (rand() % 1000)/10;
values_vec[i] = rand_val;
values_map[my_structs[i]] = rand_val;
}
std::vector<MyStruct const*> queries_ptr;
std::vector<size_t> queries_int;
for (size_t i = 0; i < ACCESSES; ++i) {
size_t idx = rand() % SIZE;
queries_int.push_back(idx);
queries_ptr.push_back(my_structs[idx]);
}
auto begin_vec = std::chrono::steady_clock::now();
double large_sum_vec = 0;
for (size_t query : queries_int) {
large_sum_vec += values_vec[query];
}
auto end_vec = std::chrono::steady_clock::now();
double large_sum_map = 0;
for (MyStruct const* query : queries_ptr) {
large_sum_map += values_map[query];
}
auto end_map = std::chrono::steady_clock::now();
std::cout << "Results for " << ACCESSES << " accesses to vector of size " << SIZE << std::endl;
std::cout << "=== VEC === Result = " << large_sum_vec << " Time = " << std::chrono::duration_cast<std::chrono::microseconds>(end_vec - begin_vec).count() << " microseconds" << std::endl;
std::cout << "=== MAP === Result = " << large_sum_vec << " Time = " << std::chrono::duration_cast<std::chrono::microseconds>(end_map - end_vec).count() << " microseconds" << std::endl;
for (size_t i = 0; i < SIZE; ++i) {
delete my_structs[i];
}
}
You can find it here on Coliru: https://coliru.stacked-crooked.com/a/a986dd2607a8566a
The result: std::vector with indices is about 10-20 times faster than std::unordered_map for random read-access
Interestingly, both running times are almost independent of the data size. With increasing numbers of accesses the relation shifts more towards 1:20. I did not test write-intensive / mixed, but I don't expect that there is much of a difference, since the element is already found at this point.
Upvotes: 0
Reputation: 238471
Generally:
Vector has better cache coherence due to lack of indirection. Accessing an index is faster because there is no need to calculate a hash function. Iteration has predictable branches.
Unordered map uses less memory with sparse structures (you said that indices are continuous, so this advantage does not apply to you). Adding or removing elements in arbitrary indices is asymptotically faster with unordered map.
Asymptotic complexity doesn't necessarily matter when you have only very few elements such as 100-1000. Cache coherency and branch prediction tend to dominate in this case.
First pick which ever data structure is more convenient. Then measure if accessing that structure has significant impact on performance of the program as a whole. If it does, then measure the difference with the other data structure to see if it is significantly faster (in relation to variance of measurement).
Upvotes: 4
Reputation: 11146
Generally speaking.
If you have something like 100-1000
elements, the container doesn't really matter in itself - using a std::map
even is better than std::unordered_map
for example if you ever need to debug print the content - unless you somehow rely on the hash. It's when you get to something like 100k+ elements that container performance starts to get interesting.
Upvotes: 0