Reputation: 33273
I have been trying for a long time to solve this. But am unable to think of a clean data structure to do the following.
I have a csv file as follows:
user_id --->
item_id ratings
|
|
|
V
So for example:
1,2,3,4,..
a,4, ,2, ,...
b, ,2,3, ,..
c, ,1,2,3,
d
and so on... The blank value means that user hasn't rated a given item. Now, for a given user (say 1), I have this dictionary:
weight_vector = {2:0.3422,3:0.222}
The computation I want to do is following:
For user 1: the values which are missing (item b and c), I want to assign a rating to it as the following:
rating_for_item_for_user_1 = [rating_given_by_user_2* weight_2] + [rating_given_by_user_3*weight_3]/[weight2 + weight3]
If user 2 or 3 has not rated a given item, then weight = 0.
I have a feeling that with numpy this should be fairly straightforward. But have not been able to think straight.
Upvotes: 0
Views: 234
Reputation: 6181
Another possibility is to use defaultdict from collections. http://docs.python.org/2/library/collections.html#collections.defaultdict
from collections import defaultdict
dict = defaultdict(float)
dict[x]=0
If you want it as matrix so you can access both column wise and rows wise you might want to load id to two different data structures or to load it to one, calculate and then transpose it.
Upvotes: 1
Reputation: 66850
Lets assume that you have a rating
matrix, and a list of weights vectors `weights', then you can simply do (assuming, that these "empty" fields are zeros - this is some border case you have to think of, because you can encounter dividing by 0 either way, when all of the users "neighbours" also did not give any rating to some item):
empty=np.where(ratings==0)
for (x,y) in zip(empty[0],empty[1]):
ratings[x,y] = sum( ratings[n][y] * weights[x][y] for n in weights[x] if ratings[n][y] != 0) / sum( weights[x][w] for w in weights[x] if ratings[w,x] != 0 )
To prevent division by zero errors you could just check for it before assignment
empty=np.where(ratings==0)
for (x,y) in zip(empty[0],empty[1]):
normalizer = sum( weights[x][w] for w in weights[x] if ratings[w,x] != 0 )
if normalizer > 0:
ratings[x,y] = sum( ratings[n,y] * weights[x][y] for n in weights[x] if ratings[n][y] != 0) / normalizer
Upvotes: 1