Reputation: 1049
I am working on a recommendation engine based on implicit feedback. I was using this link : http://insightdatascience.com/blog/explicit_matrix_factorization.html#movielens
This used ALS(Alternating least squares) to compute the user and item vectors. Since, my data set cannot be partitioned by time. I am randomly taking 'x' number of ratings from a user and putting them into the test set. This is a reproducible example of my training user-item matrix.
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col1 col12 col13 +---------------------------------------------------------------------------------------------------+ | 1 0 0 3 10 0 0 3 0 0 1 0 0 | | | 0 0 0 5 0 0 1 8 0 0 1 0 0 | | | 0 0 0 6 7 1 0 2 0 0 1 0 0 | | +---------------------------------------------------------------------------------------------------+
I then create a test set using this piece of code test_ratings = np.random.choice(counts[user,:].nonzero()[0],size=1,replace=True) train[user,test_ratings] = 0 test[user,test_ratings] = counts[user,test_ratings] assert(np.all((train * test) == 0))
Which gives me:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col1 col12 col13 +---------------------------------------------------------------------------------------------------+ | 0 0 0 0 0 0 0 3 0 0 0 0 0 | | | 0 0 0 0 0 0 1 0 0 0 0 0 0 | | | 0 0 0 6 0 0 0 0 0 0 0 0 0 | | +---------------------------------------------------------------------------------------------------+
Here the rows are users and columns are items.
Now, I was wondering if this is a correct representation of my test set. I have picked up one non zero value and made everything zero. So, my algorithm should be ranking the non zero value as the recommended item.
Is this the correct way of going about things?
Any help would be really appreciated
Upvotes: 3
Views: 1119
Reputation: 5067
Updated:
Yes you should create a test set with some of your original counts and see if your system identifies those user-items as a good match.
You should be careful with a few things:
Note: This papper, Collaborative Filtering for Implicit Feedback Datasets, should help you with these and other questions.
Upvotes: 1