Reputation: 2281
Hi Mahout community at SO!
I have couple of questions about speeding up recommendation calculations. On my server I have Mahout installed without Hadoop. Also jRuby is used for recommendation script. In the database I have 3k users and 100k items (270k items in join table). So when user requests recommendations the simple script starts working:
First it establishes db connection using PGPoolingDataSource
like this:
connection = org.postgresql.ds.PGPoolingDataSource.new()
connection.setDataSourceName("db_name");
connection.setServerName("localhost")
connection.setPortNumber(5432)
connection.setDatabaseName("db_name")
connection.setUser("mahout")
connection.setPassword("password")
connection.setMaxConnections(100)
connection
I get this warning:
WARNING: You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.
Any ideas how to fix that?
After it I create recommendations:
model = PostgreSQLJDBCDataModel.new(
connection,
'stars',
'user_id',
'repo_id',
'preference',
'created_at'
)
similarity = TanimotoCoefficientSimilarity.new(model)
neighborhood = NearestNUserNeighborhood.new(5, similarity, model)
recommender = GenericBooleanPrefUserBasedRecommender.new(model, neighborhood, similarity)
recommendations = recommender.recommend user_id, 30
For now it takes about 5-10 seconds to generate recommendation for one user. The question is how to make recommendations faster (200ms would be nice)?
Upvotes: 2
Views: 1298
Reputation: 66886
If you know you are using a pooling data source, you can ignore the warning. It means the implementation does not implement the usual interface for pooling implementations, ConnectionPoolDataSource
.
You're never going to make this run fast if trying to run directly off a database. There is just too much data access. Wrap the JDBCDataModel
in ReloadFromJDBCDataModel
and it will be cached in memory, which should work, literally, 100x faster.
Upvotes: 7