a.moussa
a.moussa

Reputation: 3277

reformat spark-streaming dstream count with print

I use this line to print a message off my RDD count:

myDStream.count.print

I get something like:

-------------------------------------------
Time: 1501499254000 ms
-------------------------------------------
2

-------------------------------------------
Time: 1501499256000 ms
-------------------------------------------
0

-------------------------------------------
Time: 1501499258000 ms
-------------------------------------------
0

I simply would like to reformat this message like this:

-------------------------------------------
Time: 1501499254000 ms
-------------------------------------------
log.info Got new batch with 2 messages

-------------------------------------------
Time: 1501499256000 ms
-------------------------------------------
log.info Got new batch with 0 messages

-------------------------------------------
Time: 1501499258000 ms
-------------------------------------------
log.info Got new batch with 0 messages

Do you have any idea?

Upvotes: 1

Views: 702

Answers (1)

maasg
maasg

Reputation: 37435

The implementation of print is fixed. If we would like to have a different output, we need to roll our own implementation:

dstream.foreachRDD{(rdd, time) =>
    val count = rdd.count()
    println("-------------------------------------------")
    println(s"Time: $time")
    println("-------------------------------------------")
    println(s"log.info Got new batch with $count messages")
}

Upvotes: 2

Related Questions