Reputation: 339
I have been reading and experimenting with the standard library's smart pointers, unique_ptr
and shared_ptr
and although they are obviously great replacements for a lot of cases where raw pointers might be deemed dangerous I am unsure of their use when implementing data structures.
To experiment, I have written an example of a hash map which uses shared_ptr
- which according to the Meyer's Effective Modern C++ are roughly double the size of a unique_ptr
. For this reason I would like to use unique_ptr but I am kind of stumped due to what I am performing in the Add function, updating and copying.
Does anyone have any suggestions for this problem? Should data structures remain to be written using raw pointers?
#pragma once
#include "core.h"
const int TABLE_SIZE = 256;
template<typename K>
class HashKey {
public:
unsigned long operator()(const K& p_key) const {
return (p_key) % TABLE_SIZE;
}
};
template<typename K, typename T>
class HashNode {
public:
K m_key;
T m_value;
std::shared_ptr<HashNode> next = nullptr;
};
template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
std::array< std::shared_ptr< HashNode<K, T> >, 128 > m_table;
F m_hash_function;
int m_elem_count{ 0 };
void Add(K p_key, T p_value);
};
template<typename K, typename T, typename F = HashKey<K>>
void HashMap<K, T, F>::Add(K p_key, T p_value)
{
unsigned long key = m_hash_function(p_key);
std::shared_ptr<HashNode<K, T>> new_node = std::make_shared<HashNode<K, T>>();
new_node->m_key = p_key;
new_node->m_value = p_value;
if (m_table[key] == nullptr) {
/* only item in the bucket */
m_table[key] = std::move(new_node);
m_elem_count++;
}
else {
/* check if item exists so it is replaced */
std::shared_ptr< HashNode<K, T> > current = m_table[key];
std::shared_ptr< HashNode<K, T> > previous = m_table[key];
while (current != nullptr && p_key != current->m_key ) {
previous = current;
current = current->next;
}
if (current == nullptr) {
previous->next = new_node;
//current = new_node;
m_elem_count++;
}
else {
current->m_value = p_value;
}
}
}
void TestHashMap() {
HashMap<int, std::string> hash_map;
hash_map.Add(1, "one");
hash_map.Add(2, "does");
hash_map.Add(3, "not");
hash_map.Add(50, "simply");
hash_map.Add(11, "waltz");
hash_map.Add(11, "into");
hash_map.Add(191, "mordor");
std::cout << hash_map.m_elem_count << std::endl;
}
Upvotes: 12
Views: 1806
Reputation: 93284
The choice of smart pointer depends on how your data structure "owns" the heap-allocated objects.
If you need to simply observe, and not own an object (independently of whether it is heap-allocated or not), a raw pointer, a reference or an std::reference_wrapper
is the appropriate choice.
If you need unique ownership (at most one owner of the heap-allocated object), then use std::unique_ptr
. It has no additional time/memory overhead.
If you need shared ownership (any number of owners of the heap-allocated object), then use std::shared_ptr
. It results in additional time/memory overhead because an additional pointer (to the reference count metadata) has to be stored, and because accessing it is guaranteed to be thread-safe.
There's no need to use std::unique_ptr
(in place of a raw pointer) unless you actually need to own the object.
Assuming you need to own the object, there's no need to use std::shared_ptr
(in place of an std::unique_ptr
) unless you actually need shared ownership semantics.
In your case, it looks like you have a maximum of heap nodes in your HashMap
. Therefore, I'm assuming that you want the HashMap
instance to be the unique owner of the nodes.
What type should you use?
template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
std::array</* ? */, 128 > m_table;
// ...
};
You have two options:
If you want to store the objects with an heap indirection, use std::unique_ptr
, as the unique owner of these heap-allocated object is and will always be the HashMap
instance.
If you want to store the objects directly into the HashMap
, with no heap indirection, then do not use any pointer at all. This could lead to very big HashMap
instances. Interface for accessing the next
nodes becomes cumbersome.
Option 1 (store nodes in the heap):
This is the most common, and probably best option.
template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
std::array<std::unique_ptr<HashNode<K, T>>, 128 > m_table;
// ...
};
This will result in lighter (in terms of memory footprint) HashMap
instances.
Note: using an std::vector
in place of an std::array
will reduce the size of HashMap
significantly, but will introduce an additional heap indirection. This is the common way of implementing a similar data structure. You generally want the HashMap
instance to be as lightweight as possible, so that it can be copied/moved/stored efficiently.
There's no need to use smart pointers to connect the nodes between each other, as the nodes are owned exclusively by HashMap
. A raw pointer will be sufficient.
template<typename K, typename T>
class HashNode {
public:
// ...
HashNode* next_ptr = nullptr;
auto& next()
{
assert(next_ptr != nullptr);
return *next_ptr;
}
};
The code above will work fine, assuming that the HashMap
is still alive when accessing next
.
Option 2 (store nodes in the map instance):
template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
std::array<HashNode<K, T>, 128 > m_table;
// ...
};
HashMap
instances may be enormous, depending on the size of HashNode<K, T>
.
If you choose to store the nodes directly into the HashMap
with no heap indirection, you will have to use an index to access the internal array, as moving/copying the HashMap
around will change the memory address of the nodes.
template<typename K, typename T>
class HashNode {
public:
// ...
int next_index = -1;
auto& next(HashMap& map)
{
assert(next_index != -1);
return map.m_table[next_index];
}
};
Upvotes: 10