Spark : How to speed rdd.count()

Question

We have streaming application which has count action

tempRequestsWithState is a DStream

tempRequestsWithState.foreachRDD { rdd =>

    print (rdd.count())

}

The count action is taking a lot of time and slow taking about 30 mins Would greatly appreciate if anyone could suggest a way to speedup this action as we are consuming @ 10,000 events/sec Also noticed we have 54 partitions for each RDD

enter image description here

Spark : How to speed rdd.count()

Answers (1)

Related Questions