Optimize emulated flatbuffer dictionary

Question

My flatbuffers schema file dict.fbs looks like this:

namespace fbs;

table Dict {
    entries:[DictEntry];
}

table DictEntry {
    key:string (key);
    value:string;
}

root_type Dict;

Now according to the documentation you can emulate a dictionary in Flatbuffers with a sorted vector and binary lookup like this

flatbuffers::FlatBufferBuilder builder(1024);

std::string key, value; 
std::ifstream infile(argv[1]);
std::string outfile(argv[2]);

std::vector> entries;

while (std::getline(infile, key) && std::getline(infile, value)) {
    entries.push_back(CreateDictEntryDirect(builder, key.c_str(), value.c_str()));
}

auto vec = builder.CreateVectorOfSortedTables(&entries);
auto dict = CreateDict(builder, vec);

builder.Finish(dict);

My original word list has 32MB on disk. Now for each word in this list I have a normalized key and a corresponding value. It would be logical if the serialized flatbuffer dict now had twice the size on disk, say 64MB, but in reality the output is 111MB.

Can I optimize this schema to be more compact? What blows up the output to almost 4 times the size?

Optimize emulated flatbuffer dictionary

Answers (1)

Related Questions