Shobit
Shobit

Reputation: 794

Mahout Item Similarity Output Empty

I'm using Mahout's ItemSimilarityJob to compute similarity of items with an input .csv file that looks like this:

user_id(numbers only), song_id(numbers only), listens(numbers only)

When I run the ItemSimilarityJob with these parameters

$MAHOUT_HOME/bin/mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input inputcsv/ --output outputcsv --similarityClassname SIMILARITY_PEARSON_CORRELATION --tempDir tempcsv --booleanData true

I get a blank part-r-00000 file inside music/csvoutput directory. There are many files inside music/csvtemp however. What could be the reason?

Upvotes: 0

Views: 1058

Answers (3)

narendra kadari
narendra kadari

Reputation: 1

mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i intro.csv --output outputcsv --similarityClassname SIMILARITY_PEARSON_CORRELATION -m 3 --tempDir tempcsv --threshold 0.7 --booleanData

this will work use it

Upvotes: 0

user1550706
user1550706

Reputation: 121

hope my experience and answer helps others, really could have saved me some precious time. You would also want the check the value of the --threshold parameter. Setting it too high (even 0.01) causes Mahout to filter data and eventually generate empty files. In my case it was my random generated data that caused this.

Upvotes: 1

Sean Owen
Sean Owen

Reputation: 66896

Probably, your input is where you think it is, or you're not indicating where you think you are. Usually the --input is a fully qualified path. Check and try that. Or your data is so small that no similarities can be computed.

Upvotes: 1

Related Questions