Alex
Alex

Reputation: 9265

Item rankings, order by confidence using Reddit Ranking Algorithms

I am interested to use this ranking class, based off of an article by Evan Miller to rank a table I have that has upvotes and downvotes. I have a system very similar to Stack Overflow's up/down voting system for an events site I am working on, and by using this ranking class I feel as though results will be more accurate. My question is how do I order by the function 'hotness'?

private function _hotness($upvotes = 0, $downvotes = 0, $posted = 0) {
    $s = $this->_score($upvotes, $downvotes);
    $order = log(max(abs($s), 1), 10);

    if($s > 0) {
        $sign = 1;
    } elseif($s < 0) {
        $sign = -1;
    } else {
        $sign = 0;
    }

    $seconds = $posted - 1134028003;

    return round($order + (($sign * $seconds)/45000), 7);
}

I suppose each time a user votes I could have a column in my table that has the hotness data recalculated for the new vote, and order by that column on the main page. But I am interested to do this more on-the-fly incorporating the function above, and I am not sure if that is possible.

From Evan Miller, he uses:

SELECT widget_id, ((positive + 1.9208) / (positive + negative) - 
                   1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) / 
                          (positive + negative)) / (1 + 3.8416 / (positive + negative)) 
       AS ci_lower_bound FROM widgets WHERE positive + negative > 0 
       ORDER BY ci_lower_bound DESC;

But I rather not do this calculation in the sql as I feel this is messy and difficult to change down the line if I utilize this code on multiple pages .etc.

Upvotes: 8

Views: 1425

Answers (3)

Michal M.
Michal M.

Reputation: 1064

You are right, query like this is rather messy and expensive as well.

Mixed PHP/MySQL on the fly is a bad idea well as you will have to select values for all posts and calculate hotness and then select a list of hotest ones. Extremely expensive.

You should consider saving at least part of your calculation to the database. Definitely order should go to the database. It's always better to calculate something and save just once on every save/update, instead of calculating each time it will be displayed. Try to do a benchmark on how much time you will save by calculating order on save/update instead of every time you calculate the hotness. Good thing is that order never changes unless someone upvotes/downvotes which you save to the db anyway, same for the sign.

Even if you save the sign to the db you are stil not able to avoid calculating on the fly due to the posted timestamp parameter.

I would see what difference does it make and where it makes a difference and calculate hotness with a CLI script every x amount of time only for those scripts where this is crucial, every y amount of time where it's making less of a difference.

Taking this approach you will be recalculating hotness only when necessary. This will make your application much more efficient.

Upvotes: 1

Mukul Joshi
Mukul Joshi

Reputation: 324

I am not sure if it is possible with your DB and Schema however have you consider writing a UDF for custom sorting?

A post from stackoverflow talks about how to do this here.

Upvotes: 0

user2480047
user2480047

Reputation:

Accessing the corresponding "Posts" table for anything (reading, writing, sorting, comparing, etc.) is extremely quick and thus relying on the database is the "most on-the-fly" alternative you have for non-temporary data storage (memory/sessions are still quicker but, logically, cannot be used to store this information).

You should be more worried about building a good ranking algorithm delivering the results you want (you are proposing two different systems, delivering different results) and working on making the whole code and the code-database communication as efficient as possible.

In principle, small codes with iterative simple orders offer the quickest and most reliable solution for this kind of situations. Example:

  1. Ranking function (like the first one you propose or any other one built on the ranking rules you want) called every time a vote is given. It writes to the corresponding column(s) in the "Posts" table (the simpler the query, the better: you can create a ranking system as complex as you wish, but try to rely on PHP rather than on queries).

  2. Every time a comparison between posts is required, the "Posts" table is read with a simple SELECT ordering the records by ranking (you can have various "assessing columns" (e.g., up-votes, down-votes, further considerations); but better having one with the definitive ranking).

Upvotes: 3

Related Questions