Hymir
Hymir

Reputation: 931

C++ Aware duplication insert into a std::map

I have a question about inserting something to a std::map in C++.

Thats my Code sofar:

stringutils.hh:

...

  unsigned long hashSDBM(char *strToHash){
      unsigned char* str = new unsigned char[strlen(strToHash) + 1];
      strncpy( (char *) str, strToHash, strlen(strToHash) );

      unsigned long hash = 0;
      int c;

      while ((c = *str++)){
          hash = c + (hash <<6) + (hash <<16) - hash;
      }

      return hash;
  }

...

hashmap.hh

#include "stringutils.hh"

namespace{

using namespace std;

class MapElement{

    private:
        char* filename;
        char* path;

    public:
        MapElement(char* f, char* p):filename(f), path(p){}
        ~MapElement(){
           delete [] filename;
           delete [] path;
        }
        char* getFileName(){ return filename; }
        char* getPath(){ return path; }

};


class HashMap{

    private:
        map<long*, MapElement*> *hm;

        long hash(char* key);

    public:
        HashMap(){
           hm = new map<long*, MapElement*>();
        }
        ~HashMap(){
           delete hm;
        }
        long put(char* k, MapElement *v);
};

long HashMap::hash(char* key){
  return stringutils::hashSDBM(key);
}


long HashMap::put(char* k, MapElement *v){
  long *key = new long();
  *key = hash(k);
  pair<map<long*,MapElement*>::iterator, bool> ret;
  ret = hm->insert(std::pair<long*, MapElement*>(key, v));

  if(ret.second == false){
    cerr<<"Already exists: "<<ret.first->second->getFileName()<<endl;
    return *key;
  }
  cerr<<"INSERTED "<<*key<<endl;
  return 0;
}

main.cc:

HashMap *hm = new HashMap();


int main(void){

  MapElement *m1; 

  char a[] = "hello";
  char b[] = "world";
  m1 = new MapElement(a,b);
  hm->put(a, m1);

  char c[] = "thats";
  char d[] = "a test";
  m1 = new MapElement(c,d);
  hm->put(c, m1);

  char e[] = "hello";
  char f[] = "test";
  m1 = new MapElement(e,f);
  hm->put(e, m1);

  return 0;
}

It's compiles whitout any errors or warnings and when I start it, the following output is generatéd:

INSERTED 7416051667693574450

INSERTED 8269306963433084652

INSERTED 7416051667693574450

Why does the second insert off the key "hello" doesnt have any effect?

Upvotes: 0

Views: 234

Answers (3)

Matthieu M.
Matthieu M.

Reputation: 299790

Goodness... please read a good C++ book before going any further, there are good books recommended in the C++ tag description.


So, the issue here is that your code uses pointers... everywhere... and pointers do not behave like you think they do. Many languages such as Java have pervasive reference types: everything is just a reference. C++ is not such a language, it makes a big difference between pointers/references on the one hand and values on the other.

In your specific case, long* is a pointer to a long. As far as map is concerned, two distinct pointers are just that: distinct, no matter the value of what they point to.

So... we need to get rid of those pointers. Everywhere. And stop using C idioms in C++.


stringutils.hh

  unsigned long hashSDBM(std::string const& strToHash){
      unsigned long hash = 0;

      for (char c: strToHash) {
          hash = c + (hash <<6) + (hash <<16) - hash;
      }

      return hash;
  }

In short:

  • don't use raw char* in C++, memory ownership is unclear which leads to leaks/dangling pointers
  • use const appropriately, a function that does not modify its argument should take const references to them
  • use C++11 for style loops, they are as efficient as manual code whilst being much easier to read and much more harder to screw up

hashmap.hh

namespace HashMap {

class MapElement{
public:
    MapElement(std::string f, std::string p):
        filename(f), path(p) {}

    std::string const& getFileName() const { return filename; }
    std::string const& getPath() const { return path; }

private:
    std::string filename;
    std::string path;
};

Let's start here:

  • No anonymous namespaces in headers, it doesn't do what you think it does (read on them)
  • No raw pointers
  • No fiddling with resources in business classes
  • const-correctness matters
  • Present the public API first, it's what users are interested in

Onward:

class HashMap{
public:
    unsigned long put(std::string const& k, MapElement v);

private:
    static unsigned long hash(std::string const& key);

    std::map<unsigned long, MapElement> hm;
};

inline unsigned long HashMap::hash(std::string const& key){
    return stringutils::hashSDBM(key);
}

inline unsigned long HashMap::put(std::string const& k, MapElement v){
    unsigned long const key = hash(k);

    auto const ret = hm.emplace(key, v);

    if (ret.second == false){
        std:: cerr << "Already exists: " << ret.first->second.getFileName() << "\n";
        return key;
    }
    std::cerr << "INSERTED " << key << "\n";
    return 0;
}

Alright...

  • No need for so many pointers, the code is simpler without them!
  • The internal hash function does not access any state, make it static
  • Declare variables at the last moment possible, and initialize them immediately... it let you use auto rather than name overly complicated types explicitly
  • std::endl does not do what you think it does (hint: it flushes the buffer! Slowest possible I/O operation!), just use plain "\n" instead

Further notes:

  • Why let the user submit the key, if said key has to be the filename? You can read it from the MapElement object instead... or print the key (and not the filename) when there is a collision, in case they differ
  • Hashes are not unique, if two different filenames hash to the same number you will reject the second... you should use a composite key (hash + filename)
  • You return 0 when inserting, and key when not... but nothing prevents the key from being 0 so receiving a 0 leaves the user doubting what happened

main.cpp

int main(void){
    HashMap::HashMap hm;

    hm.put("hello", MapElement("hello", "world"));
    hm.put("thats", MapElement("thats", "a test"));
    hm.put("hello", MapElement("hello", "test"));

    return 0;
}

And for the finish:

  • Avoid globals
  • No need to name all temporaries

Upvotes: 0

yuri kilochek
yuri kilochek

Reputation: 13486

Why does the second insert of the key doesnt have any effect?

Your key is a pointer, and two pointers to different long objects that have the same value are different keys. You would really help yourself by not using pointers so excessively. C++ is not Java.

Upvotes: 1

Armen Tsirunyan
Armen Tsirunyan

Reputation: 132994

The keys in the std::map are unique. If you want to allow duplicate keys, use std::multimap. The map::insert you're using returns a pair of iterator and a bool. The bool indicates if the insertion has actually inserted or not (not if the key was already there).

Upvotes: 2

Related Questions