Reputation: 31390
I have a certain object type that is stored in a database. This type now gets additional information associated with it which will differ in structure among instances. Although for groups of instances the information will be identically structured, the structure will only be known at runtime and will change over time.
I decided to just add a blob field to the table and store the key/value pairs there in some serialized format. From your experience, what format is most advisable?
In the context of my application, the storage space for this is secondary. There's one particular operation that I want to be fast, which is looking up the correct instance for a given set of key / value pairs (so it's a kind of variable-field composite key). I guess that means, is there a format that plays particularly well with typical database indexing?
Additionally, I might be interested in looking for a set of instances that share the same set of keys (an adhoc "class", if you wish).
I'm writing this in Java and I'm storing in various types of SQL databases. I've got JSON, GPB and native Java serialization on my radar, favouring the cross-language formats. I can think of two basic strategies:
Upvotes: 0
Views: 1283
Reputation: 15000
If your goal is to take advantage of database indexes, storing the unstructured data in a BLOB is not going to be effective. BLOBs are essentially opaque from the RDBMS's perspective.
I gather from your description that the unstructured part of the data takes the form of an arbitrary set of key-value pairs associated with the object, right? Well, if the types of all keys are the same (e.g. they're all strings), I'd recommend simply creating a child table with (at least) three columns: the key, the value, and a foreign key to the parent object's row in its table. Since the keys will then be stored in the database as a regular column, they can be indexed effectively. The index should also include the foreign key to the parent table.
A completely different approach would be to look at a "schemaless" database engine like CouchDB, which is specifically designed to deal with unstructured data. I have zero experience with such systems and I don't know how well the rest of your application would lend itself to this alternative storage strategy, but it might be worth looking into.
Upvotes: 1
Reputation: 35276
Not really an anwser to your question, but did you considered looking at the Java Edition of BerkeleyDB ? Duplicate keys and serialized values can be stored with this (fast) engine.
Upvotes: 1