Reputation: 3256
I am trying to implement this emplace or merge
template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
T&& t,
std::function<void(T&& a, T& b)> merge)
{
auto it = s.emplace(std::move(t));
T& u = const_cast<T&>(*it.first);
if (!it.second)
merge(std::move(t), u);
return u;
}
The merge function modifies its second argument in a way that preserves its hashed value. I am concerned with the use of std::move(t)
in the merge case, because the emplace
may have moved it already. I have read Microsoft's implementation of unordered_set
and there is a very nice special case for that, if constexpr (_In_place_key_extractor::_Extractable)
. It recognizes that its argument std::move(t)
can be hashed directly (and compared with operator==
directly), without constructing another object T
, and returns immediately the equivalent value in the unordered_set
when there is one.
Does this special case occur in all implementations of the standard library ? If not I have undefined behaviour, and I wonder if there is another way to code this EmplaceOrMerge
.
Upvotes: 3
Views: 821
Reputation: 1833
The std::unordered_set
container stores unique keys. This means that, before inserting an element, it verifies whether the key is already stored in the hash table. In case the key is already exists, no insertion is performed.
The problem with the emplace()
member function is that it requires to construct in-place the node to access the key. If it already exists, the node will be destroyed, potentially losing the information that has been used to construct it.
Upvotes: 0
Reputation: 157444
No, libstdc++ does not perform this optimization.
struct A {
A() = default;
A(A&&) { std::format_to(std::ostreambuf_iterator<char>(std::cout), "A(A&&)\n"); }
bool operator==(A const&) const = default;
};
template<> struct std::hash<A> { std::size_t operator()(A const&) const { return 0; } };
int main() {
std::unordered_set<A> s;
A a;
std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
s.emplace(std::move(a)).second);
std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
s.emplace(std::move(a)).second);
}
This program prints:
A(A&&)
true
false
under libc++ (and, presumably, under MS-STL), but prints
A(A&&)
true
A(A&&)
false
under libstdc++.
Demo.
I wonder if there is another way to code this
EmplaceOrMerge
.
You can't get around the fact that libstdc++ will only call Hash
on an already constructed node. If you can't change the data structure (e.g. to a std::unordered_map
from an extracted key) one option would be to use the node-handle interface, which can avoid side effects on failed insertion. Using it may still pay the cost of a move and move-assign back, but hopefully that is relatively cheap:
template<class T>
auto try_emplace(std::unordered_set<T>& s, std::type_identity_t<T>&& t) {
std::unordered_set<T> s2;
auto nh = s2.extract(s2.insert(std::move(t)).first);
auto const ins = s.insert(std::move(nh));
if (not ins.inserted)
t = std::move(ins.node.value());
return std::pair(ins.position, ins.inserted);
}
Demo.
And in your case, you can shortcut the move-assign back, so the overhead is only one move (and extra node allocation):
template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
T&& t,
std::function<void(T&& a, T& b)> merge)
{
std::unordered_set<T> s2;
auto nh = s2.extract(s2.insert(std::move(t)).first);
auto const ins = s.insert(std::move(nh));
T& u = const_cast<T&>(*ins.position);
if (not ins.inserted)
merge(std::move(ins.node.value()), u);
return u;
}
Upvotes: 1