Kevin Cohen
Kevin Cohen

Reputation: 1341

Spark streaming print on received stream

What I am trying ot achieve is basically print "hello world" each time I receive a stream of data.

I know that on each stream I can call the function foreachRDD but that does not help me because:

  1. It might be that there is no data processed
  2. I don't want to print hello on each rdd, I want to print hello on the entire stream (whether I received data or not).

Basicaly, each time the program tries to fetch data (and it does so every 30 seconds lets say because of the spark streaming context) I would like to print hello.

Is there a way of doing this? is there like a onlisten event for spark streaming?

Upvotes: 2

Views: 3998

Answers (1)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149518

Each batch interval (in your case, 30 seconds) the DStream will contain one and only one RDD, which internally is divided by several partitions. You can check if it's not empty and only then print hello world:

// Create DStream from source
dstream.foreachRDD { rdd => if (!rdd.isEmpty) println("hello world") }

Upvotes: 4

Related Questions