What is the algorithmic complexity of the algorithm that OrientDB uses to look up (and parse) property names on disk or in memory for each document?

Question

To give a little bit of background. I am currently looking for a potential database solution to store and work with large amounts of satellite and other weather data. OrientDB already has some very interesting properties that make it one of my top contenders.

These include:

Flexible document structure (partial schema approach)
Lack of table joins (constant time traversal)
Low cost PIVOT operations.
Geospatial functions

Good job OrientDB designers! I've been telling my friends and coworkers for a while that these are some properties that are possible and could be very interesting to see in a database.

My only concern is that some of these properties may have come with a hidden cost. In traditional relational databases, columns (roughly the equivalent of properties) are stored with a fixed column/property name and consequently the name only needs to be looked up once and then when the data is stored on disk or in memory it is referenced using a fixed offset into the data leading to a roughly constant time lookup of the location of the value. With document based databases it is my understanding that each property name is stored repeatedly with each document. I would assume that this would mean extra overhead finding the location of and parsing each property name repeatedly for each document.

Consequently, my question is largely what sort of overhead would be involved in terms of additional algorithmic complexity? Additionally if there is anything that can or has been done to mitigate this overhead inside the database itself? For example, indexes that point directly to the values, or storing values in fixed locations/offsets for properties that have been declared as mandatory in our schema.

Thanks in advance! - Chris

Lvca · Accepted Answer

If you declare your properties in the schema, OrientDB optimizes the reading/writing of the records by avoiding storing the property names, but rather a number that is the property id.

OrientDB stores the record with a header with all the fields and the location (pointer) of the values inside the record. In the worst case scenario, if a record (document, vertex or edge) has 50 properties and you're looking for the last one, OrientDB will look for the last property by skipping the 49 before the right one.

Fortunately, this task is blazing fast because it works on a unmarshalled and compressed byte[] that with modern processor could be easily kept in L1/L2 cache.

This allows great flexibility because you can work in a schema-less, schema-full and hybrid mode where you define in the schema only a few of them and the rest are managed in schema-less modes.

I hope this answers your question.

What is the algorithmic complexity of the algorithm that OrientDB uses to look up (and parse) property names on disk or in memory for each document?

Answers (1)

Related Questions