How to choose the weight for a weighted average?

Question

I'm conducting a feature extraction process for a machine learning problem and I came across with an issue.

Consider a set of products. Each product is rated as either 0 or 1, which maps to bad or good, respectively. Now I want to compute, for each unique product, a rating score in the [0, n] interval, where n is an integer number greater than 0.

The total ratings for each product are obviously different so a simple average will originate issues such as:

avg_ratio_score = good_rates / total_rates
a) 1/1 = 1
b) 95/100 = 0.95

Even though the ratio a) is higher, ratio b) gives much more confidence to an user. For this reason, I need a weighted average.

The problem is what weight to choose. The products' frequency varies from around 100 to 100k.

My first approach was the following:

ratings frequency interval    weight
--------------------------    ------
90k - 100k                      20
80k - 90k                       18
70k - 80k                       16
60k - 70k                       14
50k - 60k                       12
40k - 50k                       11
30k - 40k                       10
20k - 30k                        8
10k - 20k                        6
5k - 10k                         4
1k - 5k                          3
500 - 1k                         2
100 - 500                        1
1 - 100                        0.5

weighted_rating_score = good_ratings * weight / total_ratings

At first this sounded like a good solution, but looking at a real example it might not be as good as it looks:

 a. 90/100 = 0.9 * 0.5 = 0.45
 b. 50k/100k = 0.5 * 20 = 10

Such result suggests that product b) is a much better alternative than product a) but looking at the original ratios that might not be the case.

I would like to know an effective (if there is one) way to calculate the perfect weight or other similar suggestions.

How to choose the weight for a weighted average?

Answers (1)

Related Questions