ed209
ed209

Reputation: 11293

Popularity Algorithm

I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.

Here are the measures I have available:

I have searched around, but don't want user authority to play a part, but most algorithms include that.

I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.

I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.

How do I weight age against votes? Surely a site with less submission needs less weight on date added?

Upvotes: 4

Views: 3296

Answers (4)

AvatarKava
AvatarKava

Reputation: 15443

Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.

For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.

score = (votes / 10) + comments  
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)

if(age < 86400) score = score * 1.5

This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.

There are actually two big reasons NOT to calculate this ranking on the fly.

  1. Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
  2. Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.

That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.

Upvotes: 4

David Z
David Z

Reputation: 131550

Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like

SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20

but my SQL is rusty ;-)

Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.

Upvotes: 2

Georg Sch&#246;lly
Georg Sch&#246;lly

Reputation: 126085

Something like:

(count favorited + k) * / time since last activity

The higher k is the less weight has the number of people having it favorited.

You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.

Upvotes: 0

Rob
Rob

Reputation: 48369

I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.

Upvotes: 0

Related Questions