TheBat
TheBat

Reputation: 1044

Two separate keys map to a single entry in a std::map

I'm working on a piece of code that has the objective of being a fast "search engine". I have entries in a file that need to be searchable after reading the whole file in. They need to be searchable by both the name of the entry, and it's offset from the beginning of the file. My problem is one of memory usage since there are millions of entries. Currently I am using two separate std::maps to store the data so that either search term can be specified. This leads to double storage of the data which is what I am trying to reduce.

I've used valgrind massif to find that a major part of the memory usage is the double storage of entries.

Current method of storage:

struct entry {
    std::string name;
    uint16_t offset;
    uint16_t size;
    bool isConst;
};

nameSearchMap.insert(std::pair<std::string, entry>(s_entry.name, e_entry));
offsetSearchMap.insert(std::pair<uint16_t, SymInfo>(s_entry.offset, s_entry));

Is there a way that I can can make a single map that is searchable by either type of key?

Upvotes: 2

Views: 3325

Answers (4)

You can use Boost.MultiIndex as follows:

Live Coliru Demo

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/member.hpp>
#include <cstdint>
#include <string>

struct entry {
  std::string name;
  uint16_t offset;
  uint16_t size;
  bool isConst;
};

using namespace boost::multi_index;

using entry_map=multi_index_container<
  entry,
  indexed_by<
    hashed_unique<tag<struct by_name>,member<entry,std::string,&entry::name>>,
    hashed_unique<tag<struct by_offset>,member<entry,uint16_t,&entry::offset>>
  >
>;

#include <iostream>

int main()
{
  entry_map m={{"hello",0,10,false},{"hi",1,20,true},{"bye",2,15,false}};
  auto&     name_map=m.get<by_name>();
  auto&     offset_map=m.get<by_offset>();

  std::cout<<"size(\"hello\"): "<<name_map.find("hello")->size<<"\n";
  std::cout<<"size(2): "<<offset_map.find(2)->size<<"\n";
}

Output:

size("hello"): 10
size(2): 15

Upvotes: 0

oklas
oklas

Reputation: 8220

You can save three poiner per entry, and avoid entry duplication. But method usable only if insertion is not so often than fetch:

std:vector<std::string> names;
std:vector<entry> entries;
std:map<uint16_t,size_t> offset2pos;

names and entryes must be sorted identically, so pos of name equal entry with that name respectively

names[pos] == entries[pos].name

and map offset2pos can convert offset to pos

size_t pos = offset2pos[offset]

to convert name to pos use binary search (std::lower_bound)

size_t pos = names.begin() - std::lower_bound( names.begin(), names.end(), offset )

This technuque good for that cases when insertion is not so often than fetch. But algoritm is no so trivial, and you need to reassign offset2pos (possible rebuild) each time you insert one or more elements.

In the answer above You need save per element:

std::string, uint16_t, 2 * Sptr, 4 * Mptr

In the my approuch :

std::string, uint16_t, Sptr, 2 * Mptr

Where: Sptr - is smaprt pointer object internal pointer Mptr - map binary three node left and right pointer

So 3 pointer per entry is your memory profit.

Upvotes: 0

R Sahu
R Sahu

Reputation: 206557

They need to be searchable by both the name of the entry, and it's offset from the beginning of the file.

This indicates to me that you can store the data in an array/vector and use maps that map to indices in the array/vector.

std::vector<entry> entries;
std::map<std::string, size_t> map1;
std::map<uint16_t, size_t> map2;

Upvotes: 2

Ami Tavory
Ami Tavory

Reputation: 76297

You might consider using

std::map<std::string, std::shared_ptr<entry>>

for mapping strings to an entry, and

std::map<uint16_t, std::shared_ptr<entry>>

Note that by using shared pointers for the value payload (thus using the same entry object for both maps), you save the size of a payload. While you pay for two shared pointers, you'll still come out ahead for your specific structure.

enter image description here

(felt like drawing a diagram. The point, though, is that there is only one entry object in memory.)


You might also be interested in boost::bimap.

Upvotes: 6

Related Questions