Reputation: 940
I am reading stream of data in my spark application from kafka stream. My requirement is to produce product recommendation for a user when he makes any request (search/browse etc.)
I already have a trained model containing score of users. I am using Java and org.apache.spark.mllib.recommendation.MatrixFactorizationModel model to read the model once at start of my spark application. Whenever there is any browsing event, I call recommendProducts(user_id, num_of_recommended_products) API to produce recommendation for a user from my already existing trained model.
This API is taking ~3-5 seconds for generating result per user which is very slow and hence my stream processing lags behind. Are there any ways in which I can optimise the time of this API? I am considering increasing stream duration from 15 seconds to 1 minute as an optimisation (not sure of its results now)
Upvotes: 0
Views: 184
Reputation: 330383
Calling recommendProducts
in real time, doesn't make much sense. Since ALS model can make predictions only for users, which has been seen in the training dataset, it is better to recommendProductsForUser
once, store the output in a store which supports first lookups by key and fetch results from there, when needed.
If adding storage layer is not an option, you can also take output of recommendProductsForUser
, partition by id, checkpoint and cache predictions, and then join
with input stream by id.
Upvotes: 1