Reputation: 23
I am reading data from kafka in spark streaming application and doing two actions
I want to make sure that for each rdd in dstream Insert into hbase table A will happen before update operation on hbase table B (above two action happen sequentially for each rdd)
How to achieve this in spark streaming application
Upvotes: 2
Views: 813
Reputation: 1031
Update both tables sequentially in single rdd.foreach()
. It will be executed in sequential manner given you have handled exceptions properly.
This behavior is backed by the fact that its DAG will be executed in the same stage sequentially.
Upvotes: 0
Reputation: 919
As per my knowledge you can perform the above task in the below way
This will be performed in sequential manner
recordStream.foreachRDD{rdd => { //this will be Dstream RDD Records from kafka
val record = rdd.map(line => line.split("\\|")).collect
record.foreach {recordRDD => { //Write the code for Insert in hbase}
record.foreach {recordRDD => { //Write the code for Update in hbase}
Hope this Helps
Upvotes: 2