Reputation: 11
I have been working with Mahout to create a recommendation engine based on the following data:
I'm running it on a Tomcat with the following JVM arguments :
-Xms1024M -Xmx1024M -da -dsa -XX:NewRatio=9 -server
Recommendations took about 6s, it seems slow ! How could I improve Mahout performances ?
I'm using the following code :
This part is run once at startup :
JDBCDataModel jdbcdatamodel = new MySQLJDBCDataModel(dataSource);
dataModel = new ReloadFromJDBCDataModel(jdbcdatamodel);
ItemSimilarity similarity = new CachingItemSimilarity(new EuclideanDistanceSimilarity(model), model);
SamplingCandidateItemsStrategy strategy = new SamplingCandidateItemsStrategy(10, 5);
recommender = new CachingRecommender(new GenericItemBasedRecommender(model, similarity, strategy, strategy));
And, for every user request I do :
recommender.recommend(userId, howMany);
Upvotes: 1
Views: 592
Reputation: 12538
I would suggest a different approach. Use a nightly job, to pre-calculate recommendations for ALL users, and load results nightly into MySQL table. That will make showing the recommendations nothing more than a simple DB call.
Since you have 10K items, for calculating recommendations for a single user mahout has to internally multiply (10k x 10K) matrix with another (10K X 1) matrix. And 6 seconds seems quite fast considering the size. Reference
Now if you use the RecommenderJob on hadoop and AWS EMR, it will take ~ <10 mins to process data on your scale. Or you can do the same job in a non-distributed way, by simply using a loop and pre-calculating for all users sequentially. The downside is that your recommendations are always behind by 1 day or 6 hours or whatever frequency you choose for job.
Upvotes: 1