Reputation: 43
In my nodejs application I have arrays of uint32 integers which I need to store on documents in a mongodb collection. There will never be a need to access the array items through mongodb queries nor do I need to creating indexes on that field. I am wondering is there any kind of performance cost or other drawback of storing arrays with dozens or hundredths of numbers over using some other format?
Since the contents of the array will only be used on application level and not on the database level, it could be stored in mongo in whatever format I want. Then converting the query results rerieved from mongo to the format the application requires (a Uint32Array).
I can imagine the format may affect the required storage space too.
Upvotes: 1
Views: 466
Reputation: 8695
Big documents => more time to load them in memory, to query/update them.
MongoDB WiredTiger, loads the complete document to memory, not only the part it needs to query or update.
For example
{"a" :1 , "myarray" [0...100000]}
If i want to update only "a", or query only based on "a", i will load the "myarray" also.
Solutions that can help
Index(to avoid collscan) on "a" can help with that, both on querying and updating, but only if you query/update only few of those documents.
Split the document in 2 parts, the common used part, that can be used for find/update and the big part that i go there only if needed.
The basic data modelling suggestion, is to not load in memory, especially many times, data that will not be needed. (and this can be done by splitting documents in to the common used small part, and the more rarely used big part)
Benchmark example
3k documents
colA = {"a" :1 , "myarray" [0...100000]}
colB = {"a" :1 }
Query (filter only on "a", no index)
"Elapsed time: 7729.74552 msecs"
//collA
"Elapsed time: 35.06918 msecs"
//collB
Query with index on "a" times are the same
Update (change the value of "a" to 2)
"Elapsed time: 7176.342247 msecs"
//collA
"Elapsed time: 23.257879 msecs"
//collB
Maybe there are other solutions or more arguments on what approach is the best, but benchmark shows very big difference.
My suggestion is index to avoid collection scan or/and split document (if you have very big arrays)
See this from MongoDB university also It has example of splitting document.
Upvotes: 2
Reputation: 6629
According to 6 Rules of Thumb for MongoDB Schema Design, i think use an array of embedded documents is OK for you since your structure is one-to-many
not one-to-squillions
. And your array is consist of uint32 not object.
These two discussion might help you.
Upvotes: 0