Smart or raw pointers when implementing data structures?

Question

I have been reading and experimenting with the standard library's smart pointers, unique_ptr and shared_ptr and although they are obviously great replacements for a lot of cases where raw pointers might be deemed dangerous I am unsure of their use when implementing data structures.

To experiment, I have written an example of a hash map which uses shared_ptr - which according to the Meyer's Effective Modern C++ are roughly double the size of a unique_ptr. For this reason I would like to use unique_ptr but I am kind of stumped due to what I am performing in the Add function, updating and copying.

Does anyone have any suggestions for this problem? Should data structures remain to be written using raw pointers?

#pragma once
#include "core.h"

const int TABLE_SIZE = 256;

template
class HashKey {
public:
    unsigned long operator()(const K& p_key) const {
        return (p_key) % TABLE_SIZE;
    }
};

template
class HashNode {
public:
    K m_key;
    T m_value;
    std::shared_ptr next = nullptr;
};


template>
class HashMap {
public:
    std::array< std::shared_ptr< HashNode >, 128 > m_table;
    F m_hash_function;
    int m_elem_count{ 0 };

    void Add(K p_key, T p_value);
};

template>
void HashMap::Add(K p_key, T p_value)
{
    unsigned long key = m_hash_function(p_key);

    std::shared_ptr> new_node = std::make_shared>();
    new_node->m_key = p_key;
    new_node->m_value = p_value;


    if (m_table[key] == nullptr) {
        /* only item in the bucket */
        m_table[key] = std::move(new_node);
        m_elem_count++;
    }
    else {
        /* check if item exists so it is replaced */
        std::shared_ptr< HashNode > current = m_table[key];
        std::shared_ptr< HashNode > previous = m_table[key];
        while (current != nullptr && p_key != current->m_key ) {
            previous = current;
            current = current->next;
        }
        if (current == nullptr) {
            previous->next = new_node;
            //current = new_node;
            m_elem_count++;
        }
        else {
            current->m_value = p_value;
        }

    }
}

void TestHashMap() {

    HashMap hash_map;

    hash_map.Add(1, "one");
    hash_map.Add(2, "does");
    hash_map.Add(3, "not");
    hash_map.Add(50, "simply");
    hash_map.Add(11, "waltz");
    hash_map.Add(11, "into");
    hash_map.Add(191, "mordor");

    std::cout << hash_map.m_elem_count << std::endl;
}

Vittorio Romeo · Accepted Answer

The choice of smart pointer depends on how your data structure "owns" the heap-allocated objects.

If you need to simply observe, and not own an object (independently of whether it is heap-allocated or not), a raw pointer, a reference or an std::reference_wrapper is the appropriate choice.

If you need unique ownership (at most one owner of the heap-allocated object), then use std::unique_ptr. It has no additional time/memory overhead.

If you need shared ownership (any number of owners of the heap-allocated object), then use std::shared_ptr. It results in additional time/memory overhead because an additional pointer (to the reference count metadata) has to be stored, and because accessing it is guaranteed to be thread-safe.

There's no need to use std::unique_ptr (in place of a raw pointer) unless you actually need to own the object.

Assuming you need to own the object, there's no need to use std::shared_ptr (in place of an std::unique_ptr) unless you actually need shared ownership semantics.

In your case, it looks like you have a maximum of heap nodes in your HashMap. Therefore, I'm assuming that you want the HashMap instance to be the unique owner of the nodes.

What type should you use?

template>
class HashMap {
public:
    std::array m_table;
    // ...
};

You have two options:

If you want to store the objects with an heap indirection, use std::unique_ptr, as the unique owner of these heap-allocated object is and will always be the HashMap instance.
If you want to store the objects directly into the HashMap, with no heap indirection, then do not use any pointer at all. This could lead to very big HashMap instances. Interface for accessing the next nodes becomes cumbersome.

Option 1 (store nodes in the heap):

This is the most common, and probably best option.

template>
class HashMap {
public:
    std::array>, 128 > m_table;
    // ...
};

This will result in lighter (in terms of memory footprint) HashMap instances.

Note: using an std::vector in place of an std::array will reduce the size of HashMap significantly, but will introduce an additional heap indirection. This is the common way of implementing a similar data structure. You generally want the HashMap instance to be as lightweight as possible, so that it can be copied/moved/stored efficiently.

There's no need to use smart pointers to connect the nodes between each other, as the nodes are owned exclusively by HashMap. A raw pointer will be sufficient.

template
class HashNode {
public:
    // ...
    HashNode* next_ptr = nullptr;
    auto& next() 
    { 
        assert(next_ptr != nullptr);
        return *next_ptr;
    }
};

The code above will work fine, assuming that the HashMap is still alive when accessing next.

Option 2 (store nodes in the map instance):

template>
class HashMap {
public:
    std::array, 128 > m_table;
    // ...
};

HashMap instances may be enormous, depending on the size of HashNode.

If you choose to store the nodes directly into the HashMap with no heap indirection, you will have to use an index to access the internal array, as moving/copying the HashMap around will change the memory address of the nodes.

template
class HashNode {
public:
    // ...
    int next_index = -1;
    auto& next(HashMap& map) 
    { 
        assert(next_index != -1);
        return map.m_table[next_index]; 
    }
};

Smart or raw pointers when implementing data structures?

Answers (1)

Related Questions