Mike
Mike

Reputation: 298

Normalize values in the range of [0 -1]

This is a question about normalization of data that takes into account different parameters.

I have a set of articles in a website. The users use the rating system and rate the articles from 1 to 5 stars. 1 star means a bad article and marks the article 'bad'. 2 stars give an 'average' rating. 3,4 and 5 stars rate 'good', 'very good' and 'excellent'.

I want to normalize these ratings in the range of [0 - 2]. The normalized value will represent a score and will be used as a factor for boosting the article up or down in article listing. Articles with 2 or less stars, should get a score in the range of [0-1] so this boost factor will have a negative effect. Articles with rating of 2 or more stars should get a score in the range of [1-2] so this the boost factor will have a positive boost.

So for example, an article that has a 3.6 stars will get a boost factor of 1.4. This will boost the article up in the articles listing. An article with 1.9 stars will get a score of 0.8. This score will boost the article further down in the listing. An article with 2 stars will get a boost factor of 1 - no boost.

Furthermore I want to take into account the number of votes each article has. An article with a single vote of 3 stars must rank worse than an article of 4 votes and 2.8 stars average. (the boost factor could be 1.2 and 1.3 respectively)

Upvotes: 1

Views: 9682

Answers (3)

Ben
Ben

Reputation: 3042

I'm not going to solve your rating system, but a general way of normalising values is this.

Java method:

public static float normalise(float inValue, float min, float max) {
    return (inValue - min)/(max - min);
}

C function:

float normalise(float inValue, float min, float max) {
    return (inValue - min)/(max - min);
}

This method let you have negative values on both max and min. For example:

variable = normalise(-21.9, -33.33, 18.7);

Note: that you can't let max and min be the same value, or lett max be less than min. And inValue should be winth in the given range.

Write a comment if you need more details.

Upvotes: 1

kba
kba

Reputation: 19466

Based on the numbers, and a few I made up myself, I came up with these 5 points

Rating     Boost
1.0        0.5
1.9        0.8
2.0        1.0
3.6        1.4
5.0        2.0

Calculating an approximate linear regression for that, I got the formula y=0.3x+0.34.

So, you could create a conversion function

float ratingToBoost(float rating) {
    return 0.3 * rating + 0.34;
}

Using this, you will get output that approximately fits your requirements. Sample data:

Rating     Boost
1.0        0.64
2.0        0.94
3.0        1.24
4.0        1.54
5.0        1.84

This obviously has linear growth, which might not be what you're looking for, but with only three values specified, it's hard to know exactly what kind of growth you expect. If you're not satisfied with linear growth, and you want e.g. bad articles to be punished more by a lower boosting, you could always try to come up with some more values and generate an exponential or logarithmic equation.

Upvotes: 0

Sergei Danielian
Sergei Danielian

Reputation: 5015

If I understood you correctly, you should use a Sigmoid function, which refers to the special case of the Logistic function. Sigmoid and other logistic functions are often used in Neural networks to shrink (compress or normalize) input range of data (for example, to [-1,1] or [0,1] range).

Upvotes: 3

Related Questions