Reputation: 1049
I was doing a bit of reading with respect to matrix factorization methods in recommendation systems and came across this really nice tutorial: http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/
Everything was good, but this paragraph had me intrigued:
A question might have come to your mind by now: if we find two matrices P and Q such that PXQ approximates R, isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with P and Q such that we can reproduce R exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. In other words, if we let T be a set of tuples, each of which is in the form of (u_i, d_j, r_ij), such that T contains all the observed user-item pairs together with the associated ratings, we are only trying to minimise every e_ij for (u_i, d_j, r_ij) in T. (In other words, T is our set of training data.) As for the rest of the unknowns, we will be able to determine their values once the associations between the users, items and features have been learnt.
I was wondering if someone could help me with this? Do latent factors help us in understanding the behavior of each user and item?
Thanks
Upvotes: 2
Views: 774
Reputation: 5067
The latent factors are two set of values (a set for the users and a set for the items) that describe the user and the item. Essentially what you are trying to do is to find a numerical representation of your items and users.
Imagine that you have movie rating system and have 3 factors for users and 3 factors for movies (items). The user items can be how much you like comedies, drama or action movies and the movie factors how much of a comedy, drama or action movie it is. From these properties you could estimate the ratings of other pairs. This model finds these abstract factors for you.
This means that you can only find a reasonable representation for items and users for which you have ratings. So when you train your model you use the known ratings to estimate this representation. From it you can try to predict an unknown rating of a user and item.
Upvotes: 1