Reputation: 3505
I'm looking for an embeddable data storage engine in C++. RocksDB is a key-value store.
My data is very homogeneous. I have a modest number of types (on the order of 20), and I store many instances (on the order of 1 million) of those types.
I imagine that the homogeneity of my data makes RocksDB a poor choice. If I serialise each object individually, surely I'm duplicating the schema metadata? And surely that will result in poor performance?
So my question: Is RocksDB a good choice for storing homogeneous objects? If so, how does one avoid the performance implications of duplicating schema metadata?
Upvotes: 2
Views: 3644
Reputation: 5557
Unlike, e.g., sqlite, there is no schema metadata in RocksDB, because there is no schema: it maps a binary key to a binary value. RocksDB has no serialization built into it. If you are storing objects, you will have to serialize them yourself and use, e.g., the key, a key-prefix or column families (~ DB tables light) to distinguish the types.
Typically you would use RocksDB to build some kind of custom database. Someone built, e.g., a cache for protobuf objects on top of it (ProfaneDB). Often I would say it is too low-level, but if you need no structured data and queries, it will work fine, is very fast, and is generally pleasant to work with (their code is readable, and sometimes the best documentation, because you will deal with database internals).
I have used a varint key-prefix in a small toy-application before, which comes at just a byte overhead up to 127 types, but column families are probably preferable for a prod application. They also have constant overhead, and can be individually tuned, added, dropped, and managed. I wouldn't forsake the additional features you get from them for a few bytes. That is also roughly representative of the level at which you will deal with problems, if you go with RocksDB.
Upvotes: 3
Reputation: 3037
As I understand, RocksDB is really a KeyValue store and not a database at all. This means you only get the facility to store binary key and value data. Unlike a normal database (e.g. MySQL, SQLite) you don't get tables where you can define the columns/types etc..
Therefore it is your program which determines how the data would be stored.
One possibility is to store your data as JSON values, in which case as you say you pay the cost of storing the "schema" (i.e. the JSON field names) in the values.
Another choice might be, you have a special key (for example) called SCHEMA that contains an AVRO schema of all your object types. Your app can read this on startup, initialise the readers/writers, and then it knows how to process each key+value stored in RocksDB.
Yet another choice might be you hard-code the logic in your app. You could use any number of libraries for this, including AVRO (as mentioned above) or MsgPack and its variants. In this case you do need to be careful if you intend to use a RocksDB data from a previous version of the app, if you made any schema changes. So maybe store a version number or something in DB.
Upvotes: 2