shytikov
shytikov

Reputation: 9538

Extending BSON `$type` attribute for complex object?

I'm trying to store object in MongoDB. This objects comes from third-party system, and has very specific format, i.e. all object properties are stored in dictionary. Values in this dictionary could be of different types and in no particular order.

I believe to effectively search on these field I need to turn them into BSON properties. And it is doable with custom serializer / deserializer, until it comes to deserialization itself. If property is a complex object which is represented as an BSON document, custom deseriazer doesn't know to which type this document should be transformed.

How issues like that solved in a proper way using MongoDB BSON?

I would add new property $type to complex document, and store there destination type during serialization, but it is interfering with build in MongoDB $type property.

Is it possible to use standard and custom $type attributes side by side? What's the best practice approach for implementing custom deserializer in this case?

Upvotes: 1

Views: 414

Answers (1)

bauman.space
bauman.space

Reputation: 2023

not without extending the spec itself or including some reference to how it should be (de)serialized in the document itself.

PHP driver has an ODM framework that does exactly what you're proposing. I suggest you look at http://php.net/manual/en/class.mongodb-bson-persistable.php

During serialization, the driver will inject a __pclass property containing the PHP class name into the data

So, it adds a specifc key "__pclass" to the document to be stored. During deserialization, the driver reads from the key to decide what specific deserialization steps to take and strips the __pclass key/value before it returns the document (now deserialized into whatever PHP class is specified by the __pclass key) to the user.

This is incredibly dangerous if you have any reason to not trust the data held in mongodb. It's basically allowing data to dictate a call to executable PHP code.

About the spec itself. http://bsonspec.org/spec.html

The types and their associated type index is hard coded into the spec.

element     ::=     "\x01" e_name double    64-bit binary floating point
    |   "\x02" e_name string    UTF-8 string
    |   "\x03" e_name document  Embedded document
    |   "\x04" e_name document  Array
    |   "\x05" e_name binary    Binary data
    |   "\x06" e_name   Undefined (value) — Deprecated
    |   "\x07" e_name (byte*12)     ObjectId
    |   "\x08" e_name "\x00"    Boolean "false"
    |   "\x08" e_name "\x01"    Boolean "true"
    |   "\x09" e_name int64     UTC datetime
    |   "\x0A" e_name   Null value
    |   "\x0B" e_name cstring cstring   Regular expression - The first cstring is the regex pattern, the second is the regex options string. Options are identified by characters, which must be stored in alphabetical order. Valid options are 'i' for case insensitive matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w, \W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and 'u' to make \w, \W, etc. match unicode.
    |   "\x0C" e_name string (byte*12)  DBPointer — Deprecated
    |   "\x0D" e_name string    JavaScript code
    |   "\x0E" e_name string    Symbol. Deprecated
    |   "\x0F" e_name code_w_s  JavaScript code w/ scope
    |   "\x10" e_name int32     32-bit integer
    |   "\x11" e_name uint64    Timestamp
    |   "\x12" e_name int64     64-bit integer
    |   "\x13" e_name decimal128    128-bit decimal floating point
    |   "\xFF" e_name   Min key
    |   "\x7F" e_name   Max key

you could create your own user generated binary subtype if you stored the blob in a binary block, using the user-defined subtype range.

binary  ::=     int32 **subtype** (byte*)   Binary - The int32 is the number of bytes in the (byte*).

subtype     ::=     "\x00"  Generic binary subtype
    |   "\x01"  Function
    |   "\x02"  Binary (Old)
    |   "\x03"  UUID (Old)
    |   "\x04"  UUID
    |   "\x05"  MD5
    |   **"\x80"    User defined**

The down side there is that the object would be stored in the database as a binary blob, making it very difficult to query beyond subtype checking.

Anything beyond that would involve extending the specification itself

Upvotes: 1

Related Questions