Alex
Alex

Reputation: 2299

Adding a value to a separately chained hash table C++

I'm still rather new to C++ and am having trouble implementing an add method that actually works. I've done a hash map in java, but translating it to C++ has proven to be difficult. On top of this, I have to work with constraints (such as not changing anything in the header file), and not being able to use any std library function besides std::string, and std::cout/cin.

Basically, I have to create a hash map that will end up storing usernames (as a key), and passwords (as a value). At this point the username/password combination isn't that important, since implementing a very general class is the point of the exercise.

Just trying to test my HashMap with a default constructor and adding a value ends up giving me a segmentation fault. I'm 100% sure that I'm doing something horribly wrong when trying to implement this hash table. Either I'm not properly connecting the bucket index with a node, or not initializing something correctly.

Here is the header file that I am working with:

#ifndef HASHMAP_HPP
#define HASHMAP_HPP

#include <functional>
#include <string>

class HashMap
{
public:
    // Hash functions must conform to these properties:
    //
    // (1) Given a particular string s repeatedly, they must always
    //     return the same hash value.
    // (2) They do not take the number of buckets into account (as they
    //     do not receive a parameter that tells them how many buckets
    //     there are).  Any unsigned int value is fair game as a result.
    //     It will be the job of the HashMap class to reduce the results
    //     to the range of available bucket indices (e.g., by using the
    //     % operator).
    typedef std::function<unsigned int(const std::string&)> HashFunction;

    // This constant specifies the number of buckets that a HashMap will
    // have when it is initially constructed.
    static constexpr unsigned int initialBucketCount = 10;


public:
    // This constructor initializes the HashMap to use whatever default
    // hash function you'd like it to use.  A little research online will
    // yield some good ideas about how to write a good hash function for
    // strings; don't just return zero or, say, the length of the string.
    HashMap();

    // This constructor instead initializes the HashMap to use a particular
    // hash function instead of the default.  (We'll use this in our unit
    // tests to control the scenarios more carefully.)
    HashMap(HashFunction hasher);

    // The "Big Three" need to be implemented appropriately, so that HashMaps
    // can be created, destroyed, copied, and assigned without leaking
    // resources, interfering with one another, or causing crashes or
    // undefined behavior.
    HashMap(const HashMap& hm);
    ~HashMap();
    HashMap& operator=(const HashMap& hm);

    // add() takes a key and a value.  If the key is not already stored in
    // this HashMap, the key/value pair is added; if the key is already
    // stored, the function has no effect.
    //
    // If adding the new key/value pair will cause the load factor of this
    // HashMap to exceed 0.8, the following must happen:
    //
    // (1) The number of buckets should be increased by doubling it and
    //     adding 1 (i.e., if there were 10 buckets, increase it to
    //     2 * 10 + 1 = 21).
    // (2) All key/value pairs should be rehashed into their new buckets,
    //     important because changing the number of buckets will likely
    //     change which bucket a particular key hashes to (especialy if
    //     you're using % to determine the index of that bucket).
    void add(const std::string& key, const std::string& value);

    // remove() takes a key and removes it (and its associated value) from
    // this HashMap if it is already present; if not, the function has no
    // effect.
    void remove(const std::string& key);

    // contains() returns true if the given key is in this HashMap, false
    // if not.
    bool contains(const std::string& key) const;

    // value() returns the value associated with the given key in this HashMap
    // if the key is stored in this HashMap; if not, the empty string is
    // returned.  (Going forward, we'll discover that throwing an exception
    // is a better way to handle the scenario where the key is not present,
    // but we'll conquer that at a later date.)
    std::string value(const std::string& key) const;

    // size() returns the number of key/value pairs stored in this HashMap.
    unsigned int size() const;

    // bucketCount() returns the number of buckets currently allocated in
    // this HashMap.
    unsigned int bucketCount() const;

    // loadFactor() returns the proportion of the number of key/value pairs
    // to the number of buckets, a measurement of how "full" the HashMap is.
    // For example, if there are 20 key/value pairs and 50 buckets, we would
    // say that the load factor is 20/50 = 0.4.
    double loadFactor() const;

    // maxBucketSize() returns the number of key/value pairs stored in this
    // HashMap's largest bucket.
    unsigned int maxBucketSize() const;


private:
    // This structure describes the nodes that make up the linked lists in
    // each of this HashMap's buckets.
    struct Node
    {
        std::string key;
        std::string value;
        Node* next;
    };


    // Store the hash function (either the default hash function or the one
    // passed to the constructor as a parameter) in this member variable.
    // When you want to hash a key, call this member variable (i.e., follow
    // it with parentheses and a parameter) just like you would any other
    // function.
    HashFunction hasher;


    // You will no doubt need to add at least a few more private members

    public:
    // our hash function
    unsigned int hashFunc(const std::string& key) const;

private:
    Node** hashTable;

    // We need a variable that will always let us know what the current amount
    // of buckets is. bucketCount will use this and return this variable.
    unsigned int amountOfBuckets;
    // we also need the number of keys currently in the hash map. This is stored here
    unsigned int sz;
};

#endif // HASHMAP_HPP

And here is how I am implementing my class (HashMap.cpp):

#include "HashMap.hpp"

// default constructor will initialize Node to default values
// Create a new hash table with the initial bucket count
// Set the amount of buckets to the initial bucket count
// Set the current amount of key/value pairs to zero.
HashMap::HashMap()
    : hashTable{new Node*[initialBucketCount]}, amountOfBuckets{initialBucketCount}, sz{0}
{
}


// constructor that initializes HashMap to use a different hash function other
// than the default
HashMap::HashMap(HashFunction hashFunc)
    : hasher{hashFunc}, hashTable{new Node*[initialBucketCount]}, amountOfBuckets{initialBucketCount}, sz{0}
{
}

// copy constructor, initializes a new HashMap to be a copy of an existing one
HashMap::HashMap(const HashMap& hm)
// commented out for now    :
{
}

// destructor: deallocate the HashMap
HashMap::~HashMap()
{
    // delete something here
}

// Assignment operator that overloads equals
HashMap& HashMap::operator=(const HashMap& hm)
{
    // FIX COMPILER WARNINGS, DELETE
    return *this;
}

// our hash function, this is for our type def HashFunction
// pass this through the constructor
unsigned int HashMap::hashFunc(const std::string& key) const
{
    unsigned int hashValue = 0; // what we end up returning
    for(int i = 0; i < key.length(); i++) { // iterate through string
        int letterIndex = key.at(i) - 96; // create an index for each ASCII char
        // first multiply our current hashValue by a prime number
        // add to the letter index, to maintain a stable result
        // mod by the current amount of buckets on each iteration to prevent overflow
        hashValue = (hashValue * 27 + letterIndex) % bucketCount();
    }
    return hashValue;
}

// add function
void HashMap::add(const std::string& key, const std::string& value)
{
    // Check if key being stored matches key already in hashmap
    /* BASIC ADD FUNCTION, JUST TO IMPLEMENT UNIT TESTS */
    unsigned int hashVal = hashFunc(key);
    //Node* prev = nullptr; // keeps track of where we are
    Node* current = hashTable[hashVal]; // the place we store a data item into
    while(current != nullptr) {
        //prev = current; // update previous node to point to current
        // this lets us move current without losing our place
        current = current->next; // move current to the next node
    } // stop once we find an empty node
    // current should equal a nullptr
    current->key = key; // set key (user)
    current->value = value; // set password
    current->next = nullptr; // set the next ptr to be null

}

// takes in a key (username), removes it and the value (password) associated
// with it, otherwise, it has no effect
void HashMap::remove(const std::string& key)
{
}

// returns true if given key is in hash map, otherwise returns false
// this acts as a find method
bool HashMap::contains(const std::string& key) const
{
    unsigned int hashedValue = hashFunc(key); // hash the key given to get an index
    if(hashTable[hashedValue] == nullptr) { // if there are no nodes at given index
        return false;
    } else { // there are some nodes in the hash table
        // iterate through each node in the linked list
        // Node* current = hashTable[hashedValue];
        // start at first node (this is current)
        Node* current = hashTable[hashedValue];
        while(current != nullptr && current->key == key) {
            current = current->next;
        } // end while
        if(current == nullptr) { // we reached the end of our linked list
            return false; // couldn't find a value
        } else { // we found the key provided
            return true;
        }
    } // end if-else
}

// value() returns the value associated with the given key in this HashMap
// if the key is stored in this HashMap; if not, the empty string is returned.
std::string HashMap::value(const std::string& key) const
{
    // HANDLES COMPILER WARNINGS, DELETE LATER
    return "";
}

// size() returns the number of key/value pairs stored in this HashMap.
unsigned int HashMap::size() const
{
    return sz;
}

// bucketCount() returns the number of buckets currently allocated in this HashMap.
// each bucket is an index for the array, we do not include the linked lists.
unsigned int HashMap::bucketCount() const
{
    return amountOfBuckets;
}

// loadFactor() returns the proportion of the number of key/value pairs
// to the number of buckets, a measurement of how "full" the HashMap is.
// For example, if there are 20 key/value pairs and 50 buckets, we would
// say that the load factor is 20/50 = 0.4.
double HashMap::loadFactor() const
{
    return sz / amountOfBuckets;
}

// maxBucketSize() returns the number of key/value pairs stored in this
// HashMap's largest bucket.
unsigned int HashMap::maxBucketSize() const
{
    // HANDLE COMPILER WARNINGS, DELETE LATER
    return 0;
}

The main method I am implementing right now is add. Currently I'm just trying to test the class with a main function to even see if I can add values to the map, and test to see if it recognizes whether things are being contained in the map or not. I realize that very little in the class is complete, and that the functions themselves are incomplete, however I'm just trying to test the most basic cases.

Finally, here is my main:

#include <iostream>
#include <string>
#include "HashMap.hpp"

int main()
{
    // initialize test
    HashMap test1;
    std::cout << "TEST 1 HASHMAP OBJECT CREATED" << std::endl;

    // add some values
    // at this point (11/16/2014), I only have contains, add, and hashFunc
    // test these methods below
    // constructor doesn't quite work right
    std::string key1 = "Alex";
    std::string value1 = "password1";
    std::string key2 = "Danielle";
    std::string value2 = "password2";

    std::cout << "strings have been created" << std::endl;

    // add to hash map
    test1.add(key1, value1);
    test1.add(key2, value2);

    std::cout << "Keys and values have been added to hash map" << std::endl;

    // does key1 contain the word "hi"? no, should return false
    std::cout << "Hash map contains word hi?: " << test1.contains("hi") << std::endl;
    // does key2 contain word "Danielle"? yes, should return true
    std::cout << "Hash map contains word Danielle?: " << test1.contains("Danielle") << std::endl;
    return 0;
}

I am using a pre-built script to run the program after I build it. When run, I get this output:

TEST 1 HASHMAP OBJECT CREATED
strings have been created
./run: line 43: 10085 Segmentation fault      (core dumped) $SCRIPT_DIR/out/bin/a.out.$WHAT_TO_RUN

Basically the segmentation fault happens during the add function. So what is actually going on with add? And how can I understand what the hash map should be doing better?

Upvotes: 1

Views: 3120

Answers (2)

Persixty
Persixty

Reputation: 8589

Your own comment that current should equal nullptr correctly foretells the failure at the next line:

// current should equal a nullptr
current->key = key; // set key (user)

It's generally bad practice to assume a newly allocated array will be full of nullptr. You need to set it to all nullptr in the constructor.

There are other problems with your add() function.

Here's a stab at making it at least fit for purpose:

void HashMap::add(const std::string& key, const std::string& value)
{
// Check if key being stored matches key already in hashmap
/* BASIC ADD FUNCTION, JUST TO IMPLEMENT UNIT TESTS */
unsigned int hashVal = hashFunc(key);
Node* head=hashTable[hashVal];    
Node* current = head; 
if(current==nullptr){
    //Nothing in this bucket so it's definitely new.
    current=new Node();
    current->key = key; // set key (user)
    current->value = value; // set password
    current->next = nullptr; // set the next ptr to be nullptr.
    hashTable[hashVal]=current;    
    return;
}
do { //It's a do-while because the if statement above has handled current==nullptr.
    if(current->key==key){
       //We've found a match for the key.
       //Common hash-table behavior is to overwrite the value. So let's do that.
       current->value=value;
       return;
    }
    current = current->next; // move current to the next node
} while(current != nullptr) // stop once we go past the last node.

//Finally, we found hash collisions but no match on key.
//So we add a new node and chain it to the node(s) already there.
//
//Sometimes it's a good idea to put the new one at the end or if it's likely to get looked up
//it's also a good idea to put it at the start.
//It might be a good idea to keep the collision chain sorted and insert into it accordingly.
//If we sort we can dive out of the loop above when we pass the point the key would be.
//
//However for a little example like this let's put the new node at the head.

current=new Node();
current->key = key; // set key (user)
current->value = value; // set password
current->next = head; // set the next pointer to be the old head.
hashTable[hashVal]=current;    

}

PS: There's also a problem with your hash-function. You're taking modulo the hash-table size inside the main loop. That will only have the effect of restricting the distribution and giving a dreadful spread. At least move it to the last line of that function:

return hashValue % bucketCount() ;

However I recommend moving it into the add function:

unsigned int hashVal = hashFunc(key); //Assume modified to not use bucketCount().
unsigned int tableIndex=hashVal % bucketCount();//Reduce hash to a valid index.
Node* head=hashTable[tableIndex];  //TODO: Do same in the other accesses to hashTable...

You could then store the full hash-value in the Node structure and use that as a stronger pre-comparison before current->key==key. If the hash-table is likely to be very full you can get a big performance gain. It depends if you want to spare the bytes for an unsigned int per Node. If you did that and you took the tip to sort the collision chain, you would do so by hash-code and could normally avoid comparing any strings at add() new key or get() not-there and normally only once in a successful get() or overwriting add().

Upvotes: 1

Arne Kjetil Andersen
Arne Kjetil Andersen

Reputation: 483

In your HashMap::add, you're dereferencing a nullpointer. You create an array of node pointers with size of 10 elements in your constructor, but in your code you never create any node objects and assign to the pointers in your array. So when you do:

 current->key = key;

you're accessing a node-pointer that is 0.

Upvotes: 0

Related Questions