Gpwner
Gpwner

Reputation: 629

is it transform operation make a single RDD in Dstream

when I am using spark streaming ,I don't reallyunderstand transform operation,here is my code:

val conf = new SparkConf().setAppName("streaming").setMaster("local[4]")
val ssc = new StreamingContext(conf, Seconds(1))
val mDstream = 
  ssc
   .socketTextStream(args(0), 9999).flatMap(x => x.split(" "))
   .map((_, 1))
   .reduceByKeyAndWindow((a: Int, b: Int) => (a + b), Seconds(10), Seconds(3))
   .transform(rdd => {
      rdd.sortBy(_._2, false)
    })

I want to Know how many RDDs in the mDstream? appreciate that!

Upvotes: 0

Views: 41

Answers (1)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149628

transform is a method which runs on the driver side, that is how it is able to take in an RDD as its input parameter. Note that the sort will still run in parallel foreach partition inside the RDD. There will be a single RDD in a single job running your streaming job.

Upvotes: 1

Related Questions