Ælex
Ælex

Reputation: 14839

Does std::unordered_set allow insertion of duplicate elements?

I am not understanding something right. I am under the impression that an unordered_set will not allow duplicate elements, based on their hash.

I have a struct, with a specialisation of std::hash, which appears to allow duplicates, although I have manually checked it

AddEdge ( const std::shared_ptr<Relation> relation, const std::shared_ptr<Concept> concept )
{
    auto edge = std::make_shared<Edge>( (Edge){ relation, concept } );
    auto res = _edges.insert ( edge );
    return res.second;
}

An overloaded function does exactly the same but for reversed parameters

This is how struct Edge is hashed:

namespace std
{
    template<> struct hash<shared_ptr<Edge>>
    {
        size_t operator()( const shared_ptr<Edge> & ptr ) const
        {
            size_t from = 0;
            size_t to = 0;

            if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->from ) )
                from = hash<shared_ptr<Concept>>()( node );

            else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->from ) )
                from = hash<shared_ptr<Relation>>()( node );

            if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->to ) )
                 to = hash<shared_ptr<Concept>>()( node );

            else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->to ) )
                 to =  hash<shared_ptr<Relation>>()( node );

            return hash<size_t>()( from + to );
        }
    };
}

And the container held in:

std::unordered_set<std::shared_ptr<Edge>> _edges;

When I do:

graph2->AddEdge( sea_node, is_node );
graph2->AddEdge( is_node, blue_node );

I get:

Edge [sea,is] hash = 10017731961838884781
Edge [is,blue] hash = 11178184051384786808

I try a second time exactly the same, and I get the same hashes, yet, when I check the edges, I now have 4 edges instead of 2.

What am I doing wrong?

EDIT: class Concept & Relation have the same kind of hash function:

namespace std
{
    template<> struct hash<shared_ptr<Concept>>
    {
        size_t operator()( const shared_ptr<Concept> & ptr ) const
        {
            return hash<string>()( ptr->asToken()->value() ) + hash<int>()( ptr->TokenIndex() ) + hash<string>()( "Concept" );
        }
    };
}

Even more interestignly, my output from when I add Edges, produces the same hash, and yet it the duplicate Edge is added.

Upvotes: 3

Views: 16106

Answers (1)

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385144

an unordered_set will not allow duplicate elements, based on their hash

No, unordered_set avoids duplicates by comparing values, not the hashes of those values.
The "values" of each of your shared pointers is going to differ because they refer to different objects.

You can actually change this behaviour by providing your own function as the KeyEqual template parameter to unordered_set:

template<
    class Key,
    class Hash = std::hash<Key>,           // <-- you've been looking at this
    class KeyEqual = std::equal_to<Key>,   // <-- instead of this
    class Allocator = std::allocator<Key>
> class unordered_set;

If only one value with a given hash were allowed in an unordered_set then (a) you'd be unable to add any values that genuinely resulted in hash collisions, and (b) the entire hash collision resolution mechanism would become entirely unnecessary.

Upvotes: 14

Related Questions