Cache RDMS data in spark after creating sparkstreaming context

Question

We use Spark Streaming to get data from Kafka using createDirectStream.

In the same program I connect to MYSQL to get some data from the database. Now I would like to cache this result using spark.

The problem here is I have created a spark streaming context at the start , now to cache this MYSQL data I would have to convert that to a RDD that is possible only with the help of spark context, unfortunately I cannot create a spark context(since a context has already been created in terms of spark streaming).

I don't want to set spark.driver.allowMultipleContexts = true to allow the JVM to use more than spark context as that might cause a problem.

Is there a way to cache this using spark or how do we convert the result of MYSQL to an RDD?

eliasah · Accepted Answer

Here you go according to the description of your issue. Let's consider that you are initiating a StreamingContext as followed :

val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))

You can always fetch the SparkContext from your streaming context as followed :

val sc = ssc.sparkContext

and then do what ever it is you are trying to do. It is the associated Spark context for your streaming context, so no need to create a new Spark Context for that matter.

Cache RDMS data in spark after creating sparkstreaming context

Answers (1)

Related Questions