Reputation: 2109
I have about 20,000,000 pair<int, int>
which I need to associate to int
s. I did so with an unordered_map<pair<int, int>, int>
. Profiling my algorithm shows that checking whether an entry exists or not
bool exists = myMap[make_pair(a, b)] != NULL
is the performance bottleneck. I thought that retrieving this information from an unordered_map
would be really fast, as it is O(1). But constant time can be slow if the constant is big...
My hash-function is
template <>
struct tr1::hash<pair<int, int> > {
public:
size_t operator()(pair<int, int> x) const throw() {
size_t h = x.first * 1 + x.second * 100000;
return h;
}
};
Do you know any better data-structure for my problem?
Obviously I can't just store the information in a matrix, hence the amount of memory wouldn't fit into any computer in existence. All I know about the distribution is that myMap[make_pair(a, a)]
doesn't exist for any a
. And that all int
s are in a continuous range from 0 to about 20,000,000.
Think of it as a sparse 20,000,000x20,000,000-Matrix with about 20,000,000 entries but never on the main diagonal.
Would a vector<pair<int, int>>*
(array with N entries) expected to be faster? The lookup for a
would be trivial (just the index of the array) and then I would iterate through the vector, comparing the first
value of the pair to b
.
I uploaded the raw data so you can see the structure.
Upvotes: 0
Views: 3406
Reputation: 2109
As suggestet, I went with a vector<pair<int, int>>*
with N entries. It's about 40% faster than the unordered_map
.
Upvotes: 1
Reputation: 106068
Main thing is definitely to avoid adding default-constructed elements with every search:
bool exists = myMap[make_pair(a, b)] != NULL; // OUCH
bool exists = myMap.find(make_pair(a, b)) != myMap.end(); // BETTER
iterator i = myMap.find(make_pair(a, b);
if (i != myMap.end()) ... else ...; // MAY BE BEST - SEE BELOW
And the great hash challenge... woo hoo! This might be worth a shot, but a lot depends on how the numbers in the pairs are distributed and your implementation's std::hash
(which is often pass-through!):
size_t operator()(pair<int, int> x) const throw() {
size_t hf = std::hash(x.first);
return (hf << 2) ^ (hf >> 2) ^ std::hash(x.second);
}
You may also find it faster if you replace the pair with int64_t
s, so that the key comparisons are definitely simple integer comparisons rather than cascaded.
Also, what are you doing after the test for existence? If you need to access/change the value associated with the same key then you should save the iterator find
returns and avoid another search.
Upvotes: 2
Reputation: 171097
First off, myMap[make_pair(a, b)] != NULL
does not do what you think it does. It inserts the pair if it doesn't exist, and compares the mapped value to 0 (which is what NULL
expands to). It does not check for existence at all. (Note that in modern C++, you should never use NULL
. Use 0 for numbers and nullptr
for pointers).
As for the main topic, your hash function doesn't seem too good. Don't forget that arithmetic on int
s is done in int
s. Since on most compilers int
is 32-bit, its maximum value is a little over 2,000,000,000. So 20,000,000 * 10,000 is way bigger than that, leading to overflow (and undefined behaviour).
Given the number of your data, I assume you're on a 64-bit platform, which means size_t
is 64 bits long. So you might get better results with a hash function like this:
size_t operator()(pair<int, int> x) const throw() {
size_t f = x.first, s = x.second;
return f << (CHAR_BIT * sizeof(size_t) / 2) | s;
}
This should produce significantly less collisions (and have defined behaviour) that what you have now.
If this doesn't help, you could also try a two-step approach:
std::unordered_map<int, std::unordered_map<int, int>>
Lookup by x.first
first, then by x.second
. I don't know if this would help; measure and see.
Upvotes: 3
Reputation: 18652
I suggest you test with a better hash function. You can find examples if you search here on SO but this is one possible implementation.
struct pair_hash {
template <typename T1, typename T2>
size_t operator()(const std::pair<T1, T2> &pr) const {
using std::hash;
return hash<T1>()(pr.first) ^ hash<T2>()(pr.second);
}
};
Upvotes: 0
Reputation: 7663
Have you tried using myMap.find(make_pair(a,b)) != myMap.end()
? operator[]
creates the element if it does not exist. I would expect find
to be faster.
Upvotes: 5