Reputation: 629
when I am using spark streaming ,I don't reallyunderstand transform operation,here is my code:
val conf = new SparkConf().setAppName("streaming").setMaster("local[4]")
val ssc = new StreamingContext(conf, Seconds(1))
val mDstream =
ssc
.socketTextStream(args(0), 9999).flatMap(x => x.split(" "))
.map((_, 1))
.reduceByKeyAndWindow((a: Int, b: Int) => (a + b), Seconds(10), Seconds(3))
.transform(rdd => {
rdd.sortBy(_._2, false)
})
I want to Know how many RDDs in the mDstream? appreciate that!
Upvotes: 0
Views: 41
Reputation: 149628
transform
is a method which runs on the driver side, that is how it is able to take in an RDD
as its input parameter. Note that the sort will still run in parallel foreach partition inside the RDD
. There will be a single RDD
in a single job running your streaming job.
Upvotes: 1