Does ArangoDB need to load entire document into memory when only a certain key-value is needed?

Question

It seems that in MongoDB, when you request a single key-value pair from a document, the entire document needs to be loaded into the memory.

I wonder if this is also the case with ArangoDB.

It seems that with MongoDB this is a fundamental limitation, since the underlying format for documents is BSON, which is designed for traversing, and not random access. ArangoDB on the other hand seems to use VPack which has a small index table to perform random access. So unless the document queried is absurdly nested or smaller than the operating system page size, I would expect that only the page containing the given key-value pair gets loaded into memory. Am I right?

The reason I am asking is because I am designing a database to store results of huge numeric experiments. One experiment can produce (rarely) up to 1GB of data. I would like to keep one document per experiment. However, If I have 100 such experiments, and I want to retrieve only one key-value pair from each, does my machine need to load 100GB into ram?

CodeManX · Accepted Answer

With MMFiles storage engine, all documents get loaded into memory anyway when the collection gets loaded. Indexes need to rebuilt every time as they are not persisted. Document data gets synced to disk. Overall, it is a mostly-memory approach.

With RocksDB storage engine, documents and indexes get persisted and there is no need to load collections entirely or partially into memory. Instead, there is a hot set for frequently used documents. What isn't in it can be loaded from disk. The entire document data can be several times larger then the main memory, unlike with MMFiles engine.

In general, documents involved in a query get loaded into memory as a whole with RocksDB engine. However, there is an optimization to extract a single attribute, if you only ask for something like FOR doc IN coll RETURN doc.title:

Optimization rules applied:
 Id   RuleName
  1   reduce-extraction-to-projection

This will soon be extended to up to 5 attributes in v3.3 and above.

Another optimization will allow queries to be answered from indexes only if the requested attributes are indexed, which lifts the necessity to load the documents from disk into memory.

Some of that may help performance for your use case. You shouldn't store 1 GB large documents for another reason however: both engines are append-only. Any modification will result in a new document revision. Copying 1 GB of document data to update a single attribute is not going to be performant. If you don't intent to change them then this might not be a concern though.

Does ArangoDB need to load entire document into memory when only a certain key-value is needed?

Answers (1)

Related Questions