jeffery.yuan
jeffery.yuan

Reputation: 1255

Whether we can update existing model in spark-ml/spark-mllib?

We are using spark-ml to build the model from existing data. New data comes on daily basis.

Is there a way that we can only read the new data and update the existing model without having to read all the data and retrain every time?

Upvotes: 2

Views: 1480

Answers (2)

mathieu
mathieu

Reputation: 2428

To complete Florent's answer, if you are not in a streaming context, some Spark mllib models support an initialModel as a starting point for incremental updates. See KMeans, or GMM for instance.

Upvotes: 2

Florent Moiny
Florent Moiny

Reputation: 451

It depends on the model you're using but for some Spark does exactly what you want. You can look at StreamingKMeans, StreamingLinearRegressionWithSGD, StreamingLogisticRegressionWithSGD and more broadly StreamingLinearAlgorithm.

Upvotes: 4

Related Questions