Reputation: 888
I am new to spark streaming. I am trying to do some exercises on fetching data from kafka and joining with hive table.i am not sure how to do JOIN in spark streaming (not the structured streaming). Here is my code
val ssc = new StreamingContext("local[*]", "KafkaExample", Seconds(1))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "dofff2.dl.uk.feefr.com:8002",
"security.protocol" -> "SASL_PLAINTEXT",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "1",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("csvstream")
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
val strmk = stream.map(record => (record.value,record.timestamp))
Now i want to do join on one of the table in hive. In spark structured streaming i can directly call spark.table("table nanme") and do some join, but in spark streaming how can i do it since its everything based on RDD. can some one help me ?
Upvotes: 1
Views: 925
Reputation: 18023
You need transform.
Something like this is required:
val dataset: RDD[String, String] = ... // From Hive
val windowedStream = stream.window(Seconds(20))... // From dStream
val joinedStream = windowedStream.transform { rdd => rdd.join(dataset) }
From the manuals:
The transform operation (along with its variations like transformWith) allows arbitrary RDD-to-RDD functions to be applied on a DStream. It can be used to apply any RDD operation that is not exposed in the DStream API. For example, the functionality of joining every batch in a data stream with another dataset is not directly exposed in the DStream API. However, you can easily use transform to do this. This enables very powerful possibilities.
An example of this can be found here: How to join a DStream with a non-stream file?
The following guide helps: https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
Upvotes: 0