jack blank
jack blank

Reputation: 5195

Sort documents by a present field and a calculated value

How would I go about displaying the best reviews and the worst reviews at the top of the page.

I think the user's "useful" and "notUseful" votes should have an effect on the result.

I have reviews and if people click on the useful and notUseful buttons their Id gets added to the appropriate array (useful or notUseful).

you can tell what a positive or a negative score is by the "overall" score. that is 1 through 5. so 1 would be the worst and 5 would be the best.

I guess If someone gave a review with a 5 overall score but only got one useful but someone gave a score with a 4 overall and 100 people clicking on "useful" the one with 100 people should be shown as the best positive?

I only want to show 2 reviews at the top of the page the best and the worst worst review if there are ties with the overall scores the deciding factor should be the usefulness. so if there are 2 reviews with the same overall score and one of them has 5 usefuls and 10 notUsefuls that would be -5 usefuls and in the other review someone has 5 usefuls and and 4 notUsefuls that would be 1 usefuls so that would be shown to break the tie.

I'm hopping to do it with one mongoose query and not aggregation but I think the answer will be aggregation.

I guess there could be a cut off like scores greater than 3 is a positive review and lower is negative review.

I use mongoose. Thanks in advance for your help.

some sample data.

{
    "_id" : ObjectId("5929f89a54aa92274c4e4677"),
    "compId" : ObjectId("58d94c441eb9e52454932db6"),
    "anonId" : ObjectId("5929f88154aa92274c4e4675"),
    "overall" : 3,
    "titleReview" : "53",
    "reviewText" : "53",
    "companyName" : "store1",
    "replies" : [],
    "version" : 2,
    "notUseful" : [ObjectId("58d94c441eb9e52454932db6")],
    "useful" : [],
    "dateCreated" : ISODate("2017-05-27T22:07:22.207Z"),
    "images" : [],
    "__v" : 0
}


{
    "_id" : ObjectId("5929f8dfa1435135fc5e904b"),
    "compId" : ObjectId("58d94c441eb9e52454932db6"),
    "anonId" : ObjectId("5929f8bab0bc8834f41e9cf8"),
    "overall" : 3,
    "titleReview" : "54",
    "reviewText" : "54",
    "companyName" : "store1",
    "replies" : [],
    "version" : 1,
    "notUseful" : [ObjectId("5929f83bf371672714bb8d44"), ObjectId("5929f853f371672714bb8d46")],
    "useful" : [],
    "dateCreated" : ISODate("2017-05-27T22:08:31.516Z"),
    "images" : [],

    "__v" : 0
}


{
    "_id" : ObjectId("5929f956a692e82398aaa2f2"),
    "compId" : ObjectId("58d94c441eb9e52454932db6"),
    "anonId" : ObjectId("5929f93da692e82398aaa2f0"),
    "overall" : 3,
    "titleReview" : "56",
    "reviewText" : "56",
    "companyName" : "store1",
    "replies" : [],
    "version" : 1,
    "notUseful" : [],
    "useful" : [],
    "dateCreated" : ISODate("2017-05-27T22:10:30.608Z"),
    "images" : [],
    "__v" : 0
}

Upvotes: 1

Views: 138

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151132

If I am reading your question correctly then it appears you want a calculated difference of the "useful" and "nonUseful" votes to also be taken into account when sorting on the "overall" score of the documents.

The better option here is include that calculation in your stored documents, but for totality we will cover both options.

Aggregation

Without changes to your schema and other logic, then aggregation is indeed required to do that calculation. This is best presented as:

Model.aggregate([
  { "$addFields": {
     "netUseful": {
       "$subtract": [
         { "$size": "$useful" },
         { "$size": "$notUseful" }
       ]
     }
  }},
  { "$sort": { "overall": 1, "netUseful": -1 } }
],function(err, result) {

})

So you are basically getting the difference between the two arrays, where more "useful" responses have a positive impact boosting the ranking ans more "notUseful" will reduce that impact. Depending on the MongoDB version you have available you use either $addFields with only the additional field or $project with all the fields you need to return.

The $sort is then performed on the combination of the "overall" score in ascending order as per your rules, and the new field of "netUseful" in descending order ranking "positive" to "negative".

Re-Modelling

Foregoing aggregation altogether, you get a faster result from the plain query. But this of course means maintaining that "score" in the document as you add members to the array.

In basic options, you are using the $inc update operator along with $push to change the score.

So for a "useful" entry, you would do something like this:

Model.update(
  { "_id": docId, "useful": { "$ne": userId } },
  { 
    "$push": { "useful": userId },
    "$inc": { "netUseful": 1 }
  },
  function(err, status) {

  }
)

And for a "notUseful" you do the opposite by "decrementing" with a negative value to $inc:

Model.update(
  { "_id": docId, "nonUseful": { "$ne": userId } },
  { 
    "$push": { "nonUseful": userId },
    "$inc": { "netUseful": -1 }
  },
  function(err, status) {

  }
)

To cover all cases including where a vote is "changed" from "useFul" to "nonUseful" then you would expand on the logic and implement the appropriate reverse actions with $pull. But this should give the general idea.

N.B The reason we do not use the $addToSet operation here is because we want to make sure the user id is not present in the array when "incrementing" or "decrementing". Thus instead the $ne operator is used to test the value does not exist. If it does, then we do not attempt to modify the array or affect the "netUseful" value. The same applies to the reverse case of "removing" the user from those votes.

Since the calculation is always maintained with each update, you simply perform as query with a standard .sort()

Model.find().sort({ "overall": 1, "netUseful": -1 }).exec(function(err,results) {

})

So by moving the "cost" into the maintenance of the "votes", you remove the overhead of running the aggregation later. For my money, where this is a regular operation and the "sort" does not rely on other run-time parameters which force the calculation to be dynamic, then you use the stored result instead.

Upvotes: 1

Related Questions