justHelloWorld
justHelloWorld

Reputation: 6828

unordered_map : why range operation are inefficient, and this is case?

I got this example about implementing generic memoization in C++. However, as someone made notice in this comment, the original code makes 2 lookups, while the code below makes only one.

template <typename ReturnType, typename... Args>
std::function<ReturnType (Args...)> memoize(std::function<ReturnType (Args...)> func)
{
    std::map<std::tuple<Args...>, ReturnType> cache;
    return ([=](Args... args) mutable  {
            std::tuple<Args...> t(args...);
auto range = cache.equal_range(t);
if (range.first != range.second) return (*range.first).second;
return (*cache.insert(range.first, func(args...))).second;

    });
}

Someone made notice that using unordered_map would have probably better performances. But I've read that:

they are generally less efficient for range iteration through a subset of their elements. I don't understand why the range operation are less efficient, and if the case above is one of these cases (since we use range)?

Upvotes: 1

Views: 144

Answers (1)

Useless
Useless

Reputation: 67743

as someone made notice in this comment, the original code makes 2 lookups, while the code below makes only one

OK, so you're trying to use the hinted insert to avoid a redundant logarithmic-complexity search for the insertion point. You still have the same bug as the other question, and your code should probably look like:

cache.insert(range.first, std::make_pair(t, func(args...)));
//           ^hint        ^value_type

(this is overload #4 in the linked doc).

You could do without the first lookup entirely: insert simply returns an iterator to the existing element if the key is present, so you could write it with an optimistic insert - whether this is better depends on the cost of default-constructing and then assigning your ReturnType:

auto result = cache.insert(make_pair(t, ReturnType{}));
if (result.second) {
    // insertion succeeded so the value wasn't cached already
    result.first->second = func(args...);
}
return result.first->second;

... using unordered_map would have probably better performance ...

std::map find/insert scale with logarithmic complexity, and std::unordered_map scales with constant complexity. Which is actually better depends on how many entries you have, amongst other factors, and you need to profile to find out for sure.

... [unordered_map is] less efficient for range iteration ...

Well, you don't need range iteration at all if you avoid equal_range.

Look at the operations you actually use, and understand which data structure is a natural fit. For simple lookups and insertions, which are all you really need here, unordered_map is probably fine. std::map is better if you care about the relative ordering of keys, which you don't.

Both structures also have different requirements on the key: map needs an ordering (operator< or a custom comparator), while unordered_map needs a well-defined hash function and operator== or custom predicate.

For a start, I'm not sure std::hash supports std::tuple, so you might need to provide a custom hasher to use unordered_map.

Upvotes: 1

Related Questions