Reputation: 2088
Is it possible to apply Spark-Ml regression to streaming sources? I see there is StreamingLogisticRegressionWithSGD
but It's for older RDD API and I couldn't use It with structured streaming sources.
Upvotes: 1
Views: 1488
Reputation: 35229
Today (Spark 2.2 / 2.3) there is no support for machine learning in Structured Streaming and there is no ongoing work in this direction. Please follow SPARK-16424 to track future progress.
You can however:
Train iterative, non-distributed models using forEach sink and some form of external state storage. At a high level regression model could be implemented like this:
ForeachWriter.open
and initialize loss accumulator (not in a Spark sense, just local variable) for the partition.ForeachWriter.process
and update accumulator.ForeachWriter.close
. Try to hack SQL queries (see https://github.com/holdenk/spark-structured-streaming-ml by Holden Karau)
Upvotes: 4