Reputation: 6513
I've build a simple web based (spring-boot) recommendation engine using mahout configured with:
All the beans are decorated with their caching counterparts.
Dataset is:
Read from a MySQLJDBCDataModel:
CREATE TABLE `taste_preferences` (
`user_id` bigint(20) DEFAULT NULL,
`item_id` int(11) NOT NULL DEFAULT '0',
`preference` int(11) NOT NULL,
`timestamp` datetime DEFAULT NULL,
KEY `idx_taste_preferences_user_id` (`user_id`),
KEY `idx_taste_preferences_item_id` (`item_id`),
KEY `idx_taste_preferences_preference` (`preference`),
KEY `idx_taste_preferences_distinct` (`user_id`,`item_id`,`preference`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
In such a scenario I use a 0.003 sampling rate (I imagine this means using about 12K taste preferences).
In this way I still have 10/20" for the first recommendation for a given user.
How do you suggest to improve performances given the same hardware? Could be a FileDataModel faster?
Upvotes: 0
Views: 321
Reputation: 6513
Okay performance now are definitively better! The key point is decorate the dataModel in ReloadFromJDBCDataModel()
DataModel currentDataModel() throws TasteException {
DataModel datamodel = new ReloadFromJDBCDataModel(
new MySQLJDBCDataModel(new ConnectionPoolDataSource(datasource), preferenceTable, userIDColumn,
itemIDColumn, preferenceColumn, timestampColumn));
return datamodel;
}
dataModel in this scenario is read-only but this can be a non-issue with some autoreload magic behind the scenes.
For sake of completeness the significative parts of my configuration are:
UserSimilarity similarity(DataModel dataModel) throws TasteException {
return new CachingUserSimilarity(new EuclideanDistanceSimilarity(dataModel, Weighting.WEIGHTED), dataModel);
}
UserNeighborhood userNeighborhood;
UserNeighborhood neighborhood(DataModel dataModel, UserSimilarity userSimilarity) throws TasteException {
if (useThresholdUserNeighborhood) {
logger.info("Using ThresholdUserNeighborhood - threshold value is {}", threshold);
userNeighborhood = new CachingUserNeighborhood(
new ThresholdUserNeighborhood(threshold, userSimilarity, dataModel), dataModel);
} else {
logger.info(
"Using NearestNUserNeighborhood - neightborhood size is {}, min similarity is {}, sampling rate is {}",
neighborhoodSize, minSimilarity, samplingRate);
userNeighborhood = new CachingUserNeighborhood(new NearestNUserNeighborhood(neighborhoodSize, minSimilarity,
userSimilarity, dataModel, samplingRate), dataModel);
}
return userNeighborhood;
}
@Bean
public Recommender buildRecommender(DataModel dataModel) throws TasteException {
UserSimilarity userSimilarity = similarity(dataModel);
return new CachingRecommender(
new GenericUserBasedRecommender(dataModel, neighborhood(dataModel, userSimilarity), userSimilarity));
}
Upvotes: 1