dreeves
dreeves

Reputation: 26952

Implementing the Hacker News ranking algorithm in SQL

Here's how Paul Graham describes the ranking algorithm for Hacker News:

News.YC's is just

(p - 1) / (t + 2)^1.5

where p = points and t = age in hours

I'd like to do that in pure mySQL given the following tables:

The idea of the vote field is that votes can be rescinded. For the purposes of the ranking, vote=0 is equivalent to no vote at all. (All votes are upvotes, no such thing as downvotes.)

The question is how to construct a query that returns the top N postIDs, sorted by Paul Graham's formula. There are approximately 100k posts altogether so if you think caching of the scores or anything will be needed, I'd love to hear advice about that.

(Obviously this is not rocket science and I can certainly figure it out but I figured someone who eats SQL for breakfast, lunch, and dinner could just rattle it off. And it seems valuable to have available on StackOverflow.)


Related questions:

Upvotes: 13

Views: 3704

Answers (2)

OMG Ponies
OMG Ponies

Reputation: 332661

Untested:

  SELECT x.*
    FROM POSTS x
    JOIN (SELECT p.postid, 
                 SUM(v.vote) AS points
            FROM POSTS p
            JOIN VOTES v ON v.postid = p.postid
        GROUP BY p.postid) y ON y.postid = x.postid
ORDER BY (y.points - 1)/POW(((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(x.timestamp))/3600)+2, 1.5) DESC
   LIMIT n

Upvotes: 22

ayalcinkaya
ayalcinkaya

Reputation: 3343

$sql=mysql_query("SELECT * FROM news 
                         ORDER BY ((noOfLike-1)/POW(((UNIX_TIMESTAMP(NOW()) - 
                         UNIX_TIMESTAMP(created_at))/3600)+2,1.5)) DESC 
                 LIMIT 20");

This code works for me to make a home page like HN.

news: is the table name.

noOfLike: Total # of user like this news.

created_at: TimeStamp that when that news posted

Upvotes: 8

Related Questions