Reputation: 794
I'm using Mahout's ItemSimilarityJob to compute similarity of items with an input .csv file that looks like this:
user_id(numbers only), song_id(numbers only), listens(numbers only)
When I run the ItemSimilarityJob with these parameters
$MAHOUT_HOME/bin/mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input inputcsv/ --output outputcsv --similarityClassname SIMILARITY_PEARSON_CORRELATION --tempDir tempcsv --booleanData true
I get a blank part-r-00000 file inside music/csvoutput directory. There are many files inside music/csvtemp however. What could be the reason?
Upvotes: 0
Views: 1058
Reputation: 1
mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i intro.csv --output outputcsv --similarityClassname SIMILARITY_PEARSON_CORRELATION -m 3 --tempDir tempcsv --threshold 0.7 --booleanData
this will work use it
Upvotes: 0
Reputation: 121
hope my experience and answer helps others, really could have saved me some precious time. You would also want the check the value of the --threshold parameter. Setting it too high (even 0.01) causes Mahout to filter data and eventually generate empty files. In my case it was my random generated data that caused this.
Upvotes: 1
Reputation: 66896
Probably, your input is where you think it is, or you're not indicating where you think you are. Usually the --input is a fully qualified path. Check and try that. Or your data is so small that no similarities can be computed.
Upvotes: 1