Alexander Vassilev
Alexander Vassilev

Reputation: 1439

In a hashmap/unordered_map, is it possible to avoid data duplication when the value already contains the key

Given the following code:

struct Item
{
    std::string name;
    int someInt;
    string someString;
    Item(const std::string& aName):name(aName){}
};
std::unordered_map<std::string, Item*> items;
Item* item = new Item("testitem");
items.insert(make_pair(item.name, item);

The item name will be stored in memory two times - once as part of the Item struct and once as the key of the map entry. Is it possible to avoid the duplication? With some 100M records this overhead becomes huge.

Note: I need to have the name inside the Item structure because I use the hashmap as index to another container of Item-s, and there I don't have access to the map's key values.

Upvotes: 5

Views: 1424

Answers (5)

Luc Touraille
Luc Touraille

Reputation: 82171

Assuming the structure you use to store your items in the first place is a simple list, you could replace it with a multi-indexed container.

Something along thoses lines (untested) should fulfill your requirements:

typedef multi_index_container<
    Item,
    indexed_by<
        sequenced<>,
        hashed_unique<member<Item, std::string, &Item::name
    >
> itemContainer;

itemContainer items;

Now you can access items either in their order of insertion, or look them up by name:

itemContainer::nth_index<0>::type & sequentialItems = items.get<O>();
// use sequentialItems as a regular std::list

itemContainer::nth_index<1>::type & associativeItems = items.get<1>();
// uses associativeItems as a regular std::unordered_set

Depending on your needs, you can use other indexings as well.

Upvotes: 2

Jan Hudec
Jan Hudec

Reputation: 76386

No, there isn't. You can:

  • Not store name in Item and pass it around separately.
  • Create Item, ItemData that has the same fields as Item except the name and either
    • derive Item from std::pair<std::string, ItemData> (= value_type of the type) or
    • make it convertible to and from that type.
  • Use a reference to string for the key. You should be able to use std::reference_wrapper<const std::string> as key and pass key in std::cref(value.name) for key and std::cref(std::string(whatever)) for searching. You may have to specialize std::hash<std::reference_wrapper<const std::string>>, but it should be easy.
  • Use std::unordered_set, but it has the disadvantage that lookup creates dummy Item for lookup.
    • When you actually have Item * as value type, you can move the name to a base class and use polymorphism to avoid that disadvantage.
  • Create custom hash map, e.g. with Boost.Intrusive.

Upvotes: 1

user1773602
user1773602

Reputation:

OK, since you say you are using pointers as values, I hereby bring my answer back to life.

A bit hacky, but should work. Basicly you use pointer and a custom hash function

struct Item
{
    std::string name;
    int someInt;
    string someString;
    Item(const std::string& aName):name(aName){}

    struct name_hash  
    { 
       size_t operator() (std::string* name)
       {
           std::hash<std::string> h;
           return h(*name);
       }
    };
};
std::unordered_map<std::string*, Item*, Item::name_hash> items;
Item* item = new Item ("testitem");
items.insert(make_pair(&(item->name), item);

Upvotes: 4

Matthieu M.
Matthieu M.

Reputation: 300439

TL;DR If you are using libstdc++ (coming with gcc) you are already fine.

There are 3 ways, 2 are "simple":

  • split your object in two Key/Value, and stop duplicated the Key in the Value
  • store your object in a unordered_set instead

The 3rd one is more complicated, unless provided by your compiler:

  • use an implementation of std::string that is reference counted (such as libstdc++'s)

In this case, when you copy a std::string into another, the reference counter of the internal buffer is incremented... and that's all. Copy is deferred to a time where a modification is requested by one of the owners: Copy On Write.

Upvotes: 1

Denis Ermolin
Denis Ermolin

Reputation: 5556

Don't store std::string name field in your struct. Anyway when you perform lookup you already know name field.

Upvotes: 1

Related Questions