Reputation: 417
Using Apache Spark's mllib, I have a Logistic Regression model that I store in HDFS. This Logistic Regression model is trained on historical data coming in from some sensors.
I have another spark program that consumes streaming data from these sensors. I want to be able to use the pre-existing trained model to do predictions on incoming data stream. Note: I don't want my model to be updated by this data.
To load the training model, I'd have to use the following line in my code:
val logisticModel = LogisticRegressionModel.load(sc, <location of model>)
sc: spark context.
However, this application is a streaming application and hence already has a "StreamingContext" setup. Now, from what I've read, it is bad practice to have two contexts in the same program (even though it is possible).
Does this mean that my approach is wrong and I can't do what I'm trying to ?
Also, would it make more sense if I keep storing the stream data in a file and keep running logistic regression on that rather than trying to do it directly in the streaming application ?
Upvotes: 3
Views: 497
Reputation: 330203
StreamingContext
can created in a few ways including two constructors which take an existing SparkContext
:
StreamingContext(path: String, sparkContext: SparkContext)
- where path
is a path to a checkpoint fileStreamingContext(sparkContext: SparkContext, batchDuration: Duration)
So you can simply create SparkContext
, load required models, and create StreamingContext
:
val sc: SparkContext = ???
...
val ssc = new StreamingContext(sc, Seconds(1))
You can also get SparkContext
using StreamingContext.sparkContext
method:
val ssc: StreamingContext = ???
ssc.sparkContext: SparkContext
Upvotes: 2