Thibaud
Thibaud

Reputation: 11

Mahout Recommendation performance issues

I have been working with Mahout to create a recommendation engine based on the following data:

I'm running it on a Tomcat with the following JVM arguments :

-Xms1024M -Xmx1024M -da -dsa -XX:NewRatio=9 -server

Recommendations took about 6s, it seems slow ! How could I improve Mahout performances ?

I'm using the following code :

This part is run once at startup :

JDBCDataModel jdbcdatamodel = new MySQLJDBCDataModel(dataSource);
dataModel = new ReloadFromJDBCDataModel(jdbcdatamodel);

ItemSimilarity similarity = new CachingItemSimilarity(new EuclideanDistanceSimilarity(model), model);
SamplingCandidateItemsStrategy strategy = new SamplingCandidateItemsStrategy(10, 5);
recommender = new CachingRecommender(new GenericItemBasedRecommender(model, similarity, strategy, strategy));

And, for every user request I do :

recommender.recommend(userId, howMany);

Upvotes: 1

Views: 592

Answers (1)

Zasz
Zasz

Reputation: 12538

I would suggest a different approach. Use a nightly job, to pre-calculate recommendations for ALL users, and load results nightly into MySQL table. That will make showing the recommendations nothing more than a simple DB call.

Since you have 10K items, for calculating recommendations for a single user mahout has to internally multiply (10k x 10K) matrix with another (10K X 1) matrix. And 6 seconds seems quite fast considering the size. Reference

Now if you use the RecommenderJob on hadoop and AWS EMR, it will take ~ <10 mins to process data on your scale. Or you can do the same job in a non-distributed way, by simply using a loop and pre-calculating for all users sequentially. The downside is that your recommendations are always behind by 1 day or 6 hours or whatever frequency you choose for job.

Upvotes: 1

Related Questions