Reputation: 509
I'm curious why in the example below the Mahout recommender isn't returning a recommendation for user 1.
My input file is below. I added blank lines to enhance readability. This file will need the blank lines removed before it's run through Mahout.
The columns in this file are:
User ID | item number | item rating
1 101 0
1 102 0
1 103 5
1 104 0
2 101 4
2 102 5
2 103 4
2 104 0
3 101 0
3 102 5
3 103 5
3 104 3
You'll note that item 103 is the only common item that all 3 users rated.
I ran: hadoop jar C:\hdp\mahout-0.9.0.2.1.3.0-1981\core\target\mahout-core-0.9.0.2.1.3.0-1981-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input small_data_set.txt --output small_data_set_output
The Mahout recommendation output file shows:
2 [104:4.5] 3 [101:5.0]
Which I believe means:
User 2 would be recommended item 104. Since user 3 rated item 104 a 3 this may account for the 4.5 recommendation score vs. the result below…
User 3 would be recommended item 101. Since user 2 rated item 101 a "4" this may account for a slightly higher recommendation score of 5.
Is this correct?
Why isn't user 1 included in the recommendation output file? User 1 could have received a recommendation for Item 102 because user 2 and user 3 rated it. Is the data set too small?
Thanks in advance.
Upvotes: 1
Views: 763
Reputation: 5702
Several mistakes may be present in your data, the first two here will cause undefined behavior:
There are very few uses for preference values unless you are trying to predict a user's rating for an item. The preference weights are useless in determining recommendation ranking, which is the typical thing to optimize. If you want to recommend the right things in the right order toss the values and use LLR.
The other thing that people sometimes do with values is show some weight of preference so 1 = a view of a product page and 5 = a product purchase. This will not work! I tried this with a large ecommerce dataset and found the recommendations were worse when adding in product views, even though there was 100 times more data. They are fundamentally different user actions with different user intent and so can't be mixed in this way.
If you really do want to mix different actions use the new multimodal recommender based on Mahout, Spark, and Solr described on the Mahout site here: It allows cross-cooccurrence type indicator calculations so you can use user location, likes and dislikes, view and purchase. Virtually the entire user clickstream can be used. But only with cross-cooccurrence correlating one action to the canonical "best" action, the one you want to recommend.
Upvotes: 2