Reputation: 9538
I'm trying to store object in MongoDB. This objects comes from third-party system, and has very specific format, i.e. all object properties are stored in dictionary. Values in this dictionary could be of different types and in no particular order.
I believe to effectively search on these field I need to turn them into BSON properties. And it is doable with custom serializer / deserializer, until it comes to deserialization itself. If property is a complex object which is represented as an BSON document, custom deseriazer doesn't know to which type this document should be transformed.
How issues like that solved in a proper way using MongoDB BSON?
I would add new property $type
to complex document, and store there destination type during serialization, but it is interfering with build in MongoDB $type
property.
Is it possible to use standard and custom $type
attributes side by side? What's the best practice approach for implementing custom deserializer in this case?
Upvotes: 1
Views: 414
Reputation: 2023
not without extending the spec itself or including some reference to how it should be (de)serialized in the document itself.
PHP driver has an ODM framework that does exactly what you're proposing. I suggest you look at http://php.net/manual/en/class.mongodb-bson-persistable.php
During serialization, the driver will inject a __pclass property containing the PHP class name into the data
So, it adds a specifc key "__pclass" to the document to be stored. During deserialization, the driver reads from the key to decide what specific deserialization steps to take and strips the __pclass key/value before it returns the document (now deserialized into whatever PHP class is specified by the __pclass key) to the user.
This is incredibly dangerous if you have any reason to not trust the data held in mongodb. It's basically allowing data to dictate a call to executable PHP code.
About the spec itself. http://bsonspec.org/spec.html
The types and their associated type index is hard coded into the spec.
element ::= "\x01" e_name double 64-bit binary floating point
| "\x02" e_name string UTF-8 string
| "\x03" e_name document Embedded document
| "\x04" e_name document Array
| "\x05" e_name binary Binary data
| "\x06" e_name Undefined (value) — Deprecated
| "\x07" e_name (byte*12) ObjectId
| "\x08" e_name "\x00" Boolean "false"
| "\x08" e_name "\x01" Boolean "true"
| "\x09" e_name int64 UTC datetime
| "\x0A" e_name Null value
| "\x0B" e_name cstring cstring Regular expression - The first cstring is the regex pattern, the second is the regex options string. Options are identified by characters, which must be stored in alphabetical order. Valid options are 'i' for case insensitive matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w, \W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and 'u' to make \w, \W, etc. match unicode.
| "\x0C" e_name string (byte*12) DBPointer — Deprecated
| "\x0D" e_name string JavaScript code
| "\x0E" e_name string Symbol. Deprecated
| "\x0F" e_name code_w_s JavaScript code w/ scope
| "\x10" e_name int32 32-bit integer
| "\x11" e_name uint64 Timestamp
| "\x12" e_name int64 64-bit integer
| "\x13" e_name decimal128 128-bit decimal floating point
| "\xFF" e_name Min key
| "\x7F" e_name Max key
you could create your own user generated binary subtype if you stored the blob in a binary block, using the user-defined subtype range.
binary ::= int32 **subtype** (byte*) Binary - The int32 is the number of bytes in the (byte*).
subtype ::= "\x00" Generic binary subtype
| "\x01" Function
| "\x02" Binary (Old)
| "\x03" UUID (Old)
| "\x04" UUID
| "\x05" MD5
| **"\x80" User defined**
The down side there is that the object would be stored in the database as a binary blob, making it very difficult to query beyond subtype checking.
Anything beyond that would involve extending the specification itself
Upvotes: 1