V. Semeria
V. Semeria

Reputation: 3256

Emplace or merge on std::unordered_set

I am trying to implement this emplace or merge

template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
                  T&& t,
                  std::function<void(T&& a, T& b)> merge)
{
    auto it = s.emplace(std::move(t));
    T& u = const_cast<T&>(*it.first);
    if (!it.second)
        merge(std::move(t), u);
    return u;
}

The merge function modifies its second argument in a way that preserves its hashed value. I am concerned with the use of std::move(t) in the merge case, because the emplace may have moved it already. I have read Microsoft's implementation of unordered_set and there is a very nice special case for that, if constexpr (_In_place_key_extractor::_Extractable). It recognizes that its argument std::move(t) can be hashed directly (and compared with operator== directly), without constructing another object T, and returns immediately the equivalent value in the unordered_set when there is one.

Does this special case occur in all implementations of the standard library ? If not I have undefined behaviour, and I wonder if there is another way to code this EmplaceOrMerge.

Upvotes: 3

Views: 821

Answers (2)

LoS
LoS

Reputation: 1833

The std::unordered_set container stores unique keys. This means that, before inserting an element, it verifies whether the key is already stored in the hash table. In case the key is already exists, no insertion is performed.

The problem with the emplace() member function is that it requires to construct in-place the node to access the key. If it already exists, the node will be destroyed, potentially losing the information that has been used to construct it.

Upvotes: 0

ecatmur
ecatmur

Reputation: 157444

No, libstdc++ does not perform this optimization.

struct A {
    A() = default;
    A(A&&) { std::format_to(std::ostreambuf_iterator<char>(std::cout), "A(A&&)\n"); }
    bool operator==(A const&) const = default;
};
template<> struct std::hash<A> { std::size_t operator()(A const&) const { return 0; } };
int main() {
    std::unordered_set<A> s;
    A a;
    std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
        s.emplace(std::move(a)).second);
    std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
        s.emplace(std::move(a)).second);
}

This program prints:

A(A&&)
true
false

under libc++ (and, presumably, under MS-STL), but prints

A(A&&)
true
A(A&&)
false

under libstdc++.

Demo.


I wonder if there is another way to code this EmplaceOrMerge.

You can't get around the fact that libstdc++ will only call Hash on an already constructed node. If you can't change the data structure (e.g. to a std::unordered_map from an extracted key) one option would be to use the node-handle interface, which can avoid side effects on failed insertion. Using it may still pay the cost of a move and move-assign back, but hopefully that is relatively cheap:

template<class T>
auto try_emplace(std::unordered_set<T>& s, std::type_identity_t<T>&& t) {
    std::unordered_set<T> s2;
    auto nh = s2.extract(s2.insert(std::move(t)).first);
    auto const ins = s.insert(std::move(nh));
    if (not ins.inserted)
        t = std::move(ins.node.value());
    return std::pair(ins.position, ins.inserted);
}

Demo.

And in your case, you can shortcut the move-assign back, so the overhead is only one move (and extra node allocation):

template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
                  T&& t,
                  std::function<void(T&& a, T& b)> merge)
{
    std::unordered_set<T> s2;
    auto nh = s2.extract(s2.insert(std::move(t)).first);
    auto const ins = s.insert(std::move(nh));
    T& u = const_cast<T&>(*ins.position);
    if (not ins.inserted)
        merge(std::move(ins.node.value()), u);
    return u;
}

Upvotes: 1

Related Questions