Reputation: 5780
"book" collection has following fields:
A book is popular if it has a good score
and a lot of votes
.
I want to query all books to return popular books first, initially I did something like:
db.collection('book').find().sort({ score: -1, votes: -1 })
Which returns these books:
name | score | votes
--------------------
foo | 4.9 | 3
bar | 4.6 | 203223
baz | 4.3 | 323299
As you can see, first returned result (book named "foo") has a very good score
, but very few votes
. I would like to exclude it, or at least give it less importance.
How can I update previous query to take both score
and votes
fields into consideration?
Answer:
I ended up using: https://www.quora.com/How-does-IMDbs-rating-system-work#:~:text=The%20formula%20for%20calculating%20the,for%20the%20movie%20%3D%20(votes)
Upvotes: 1
Views: 148
Reputation: 608
You can use a weighting function for this. Something like a simplified Bayesian estimator: https://en.wikipedia.org/wiki/Bayes_estimator#Practical_example_of_Bayes_estimators
W = (R*v) / (v + m)
where W = weighted rating R average rating (the value of score) v votes m weight given to the prior estimate (in this case, the minimum votes a score needs to be seen as 'valid' - I'm using 100 in this case but you can use anything:
foo | 4.9 | 3
bar | 4.6 | 203223
baz | 4.3 | 323299
So foo would have a weighted rating of (4.9 * 3)/(3+100) = 14.7/300 = .049
bar would be (4.6 * 203223) / (203223 + 100) = 4.5977
(almost 4.6)
baz would be (4.3 * 323299) / (323299 + 100) = 4.2987
(almost 4.3, but closer to 4.3 than bar is to 4.6 because it has more votes)
And here are some more values:
one | 4.7 | 90 | 2.226
two | 4.6 | 100 | 2.3
three | 4.5 | 110 | 2.357
So you can see how a higher score with fewer votes is weighted less, but once you're far past the minimum number of votes, the score is basically the same as the average.
(I simplified the calculation that was in the wiki page)
Upvotes: 3