Reputation: 6266
Let's say I have 1,000,000,000 entities in a MongoDB, and each entity has 3 numerical properties, A, B, and C.
for example:
entity1 : { A: 35, B: 60, C: 5 }
entity2 : { A: 15, B: 10, C: 55 }
entity2 : { A: 10, B: 10, C: 10 }
...
Now I need to query the database. The input of the query would be 3 numbers: (a, b, c)
. The result would be a list of entities in descending order as defined by the weighted average, or A * a + B * b + C * c
.
so q(1, 100, 1)
would return (entity1, entity2, entity3)
and q(1, 1, 100)
would return (entity2, entity1, entity3)
Can something like this be achieved with MongoDB, without calculating the weighted average of every entity on every query? I am not bound to MongoDB, but am learning the MEAN stack. If I have to use something else, that is fine too.
NOTE: I chose 1,000,000,000 entities as an extreme example. My actual use case will only have ~5000 entities, so iterating over everything might be OK, I am just interested in a more clever solution.
Upvotes: 1
Views: 700
Reputation: 50416
Well of course you have to calculate it if you are providing input and cannot use a pre-calculated field, but the only difference here would be returning all items and sorting them in the client or letting the server do the work:
var a = 1,
b = 1,
c = 100;
db.collection.aggregate(
[
{ "$project": {
"A": 1,
"B": 1,
"C": 1,
"weight": {
"$add": [
{ "$multiply": [ "$A", a ] },
{ "$multiply": [ "$B", b ] },
{ "$multiply": [ "$C", c ] }
]
}
}},
{ "$sort": { "weight": -1 } }
],
{ "allowDiskUse": true }
)
So the key here is the .aggregate()
method allows for document manipulation which is required to generate the value on which to apply the $sort
.
The calculated value is provided in a $project
pipeline stage before this using $multiply
against each field value to each external variable fed into the pipeline, with the final math operation performing an $add
on each argument in result to produce "weight" as a field to sort on.
You cannot directly feed algorithms to any "sort" methods in MongoDB, as they need to act on a field present in the document. The aggregation framework provides the means to "project" this value, so a later pipeline stage can then perform the sort required.
The other case here is that due to the sizes of documents you are generally proposing, it is better to supply "allowDiskUse" as an option to force the aggregation process to store processed documents temporily on disk and not in memory, as there is a restriction on the amount of memory that can be used in an aggregation process without this option.
Upvotes: 2