James
James

Reputation: 15475

What's the best way to normalize scores for ranking things?

I'm curious how to do normalizing of numbers for a ranking algorithm

let's say I want to rank a link based on importance and I have two columns to work with

so a table would look like

url | comments | views

now I want to rank comments higher than views so I would first think to do comments*3 or something to weight it, however if there is a large view number like 40,000 and only 4 comments then the comments weight gets dropped out.

So I'm thinking I have to normalize those scores down to a more equal playing field before I can weight them. Any ideas or pointers to how that's usually done?

thanks

Upvotes: 3

Views: 8037

Answers (3)

Joel Hoff
Joel Hoff

Reputation: 1993

A similar problem was discussed a few weeks ago in this SO topic: "Algorithm to calculate a page importance based on its views / comments".

I'll give the same advice I offered there: use linear regression on a representative distribution of comment/view counts for web pages to work out a weighting function.

Upvotes: 0

Olek Beluga
Olek Beluga

Reputation: 11

Importance is really a way of notifying the user about how interested he could be in the forum topic or a blog spot. In this case, you can't just multiply two numbers by different factors and add :)

What can you say about a blogpost with 2000 views and only one comment. Well, perhaps it's a spam post, or it was viewed by web-crawlers, or it's so boring that no one decided to comment on it.

In this case, we might want to look at a ratio of comments versus views. My original post would have an "interest ratio" of 1/2000 while this post, which got 28 views and 1 comment right now, it would get a score of 1/28.

The biggest ratio wins. By the way, if you are having ratios over one... well, start looking for bugs :)

Upvotes: 1

btreat
btreat

Reputation: 1554

For each url, you could first normalize the comments and views to a percentile. For example,

 comment_percentile = (comments - min(comments)) / (max(comments) - min(comments))
 views_percentile = (views - min(views)) / (max(views) - min(views))

Then you could assign weights to each of the percentile values to compute the overall score.

 url_score = (comment_percentile_weight * comment_percentile) + (views_percentile_weight * views_percentile)

Additional strategies may involve eliminating outliers if the values cluster toward one end of the range.

Upvotes: 5

Related Questions