Polymerase
Polymerase

Reputation: 6791

Spark Streaming how to guarantee order of multiple foreachRDD

I would like to perform a sequence of actions on a DStream. Action N+1 must be executed after action N. What is the difference between these implementations?

val myDStream = ???

//version 1
myDStream.foreachRDD(rdd => action 1)
myDStream.foreachRDD(rdd => action 2)
myDStream.foreachRDD(rdd => action 3)

//version 2
myDStream.foreachRDD{rdd => 
  action 1
  action 2
  action 3
}

Upvotes: 0

Views: 189

Answers (1)

maasg
maasg

Reputation: 37435

If we assume that each action operates on the complete RDD, such as action(rdd), then the two expressions should be equivalent in the order of the results.

At execution level, the top version will generate three spark jobs, while the bottom version will generate only one.

Upvotes: 4

Related Questions