Reputation: 47
I'm trying to write my first recommendations model (Spark 2.0.2) and i would like to know if is it possible, after initial train when the model elaborate all my rdd, work with just a delta for the future train.
Let me explain through an example:
The question is, is it possible to execute in some way the step 4?
Upvotes: 1
Views: 623
Reputation: 74679
My understanding is that it is only possible with machine learning algorithms that are designed to support streaming training like StreamingKMeans or StreamingLogisticRegressionWithSGD.
Quoting their documentations (see the active references above):
(StreamingLogisticRegressionWithSGD) trains or predicts a logistic regression model on streaming data. Training uses Stochastic Gradient Descent to update the model based on each new batch of incoming data from a DStream (see LogisticRegressionWithSGD for model equation)
StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data.
What worries me about the algorihtms is that they belong to org.apache.spark.mllib.clustering
package which is now deprecated (as it's RDD-based not DataFrame-based). I don't know if they've got their JIRAs to retrofit them with DataFrame.
Upvotes: 1