Computing ratings in matrix in python

Question

I have been trying for a long time to solve this. But am unable to think of a clean data structure to do the following.

I have a csv file as follows:

           user_id --->
item_id     ratings
|
|
|
V

So for example:

  1,2,3,4,..
a,4, ,2, ,...   
b, ,2,3, ,..
c, ,1,2,3,
d

and so on... The blank value means that user hasn't rated a given item. Now, for a given user (say 1), I have this dictionary:

weight_vector = {2:0.3422,3:0.222}

The computation I want to do is following:

For user 1: the values which are missing (item b and c), I want to assign a rating to it as the following:

 rating_for_item_for_user_1 = [rating_given_by_user_2* weight_2] + [rating_given_by_user_3*weight_3]/[weight2 + weight3]

If user 2 or 3 has not rated a given item, then weight = 0.

I have a feeling that with numpy this should be fairly straightforward. But have not been able to think straight.

lejlot · Accepted Answer

Lets assume that you have a rating matrix, and a list of weights vectors `weights', then you can simply do (assuming, that these "empty" fields are zeros - this is some border case you have to think of, because you can encounter dividing by 0 either way, when all of the users "neighbours" also did not give any rating to some item):

empty=np.where(ratings==0)
for (x,y) in zip(empty[0],empty[1]):
    ratings[x,y] = sum( ratings[n][y] * weights[x][y] for n in weights[x] if ratings[n][y] != 0) / sum( weights[x][w] for w in weights[x] if ratings[w,x] != 0 )

To prevent division by zero errors you could just check for it before assignment

empty=np.where(ratings==0)
for (x,y) in zip(empty[0],empty[1]):
    normalizer = sum( weights[x][w] for w in weights[x] if ratings[w,x] != 0 )
    if normalizer > 0:
        ratings[x,y] = sum( ratings[n,y] * weights[x][y] for n in weights[x] if ratings[n][y] != 0) / normalizer

Computing ratings in matrix in python

Answers (2)

Related Questions