Reputation: 1
I am building a recommender system from query logs. For each query log I have data for what links were clicked by user. Users do not provide any ratings for the links they visit. I am trying to create a recommendation system that will suggest "If you have clicked this one, try this one which another similar user has tried". I am exploring Apache Spark - MLLib to use collaborative filtering for the purpose. Unfortunately the ALS algorithm takes "ratings" data.
Here is one of the solutions I got online:
"For each page we want recommendations for, we search for all the users who have viewed that page. Then, for each of those users, we look up all other pages they have viewed. We then count the number of users which have viewed each page in this data set, and use those with the highest count as our recommendations."
The user suggests that this approach is slow.
I was wondering if there is a good way to 'fake' the ranking data, or is there a popular open source implementation which does not requires the ranking data?
Upvotes: 0
Views: 741
Reputation: 735
ratings could be counts as well in the case of implicit feedback. Ex (user1, url1, 1/0), 1/0 clicked or not.
Now you are asking a different question, anyways, there is a difference between sparse matrixes and dense matrixes. You do not need to add any 0, thats the idea of the ratings, you have those which you have a click, for example (u1,url1,1) and if this is the only url the user 1 clicked thats it, you do not need to add the ceros for those he has not clicked yet. The model knows this is the input data format being used.
I hope it helps.
Upvotes: 1