Prabhat
Prabhat

Reputation: 127

How to print last n lines of a dstream in spark streaming?

Spark streaming dstream print() displays first 10 lines like
val fileDstream = ssc.textFileStream("hdfs://localhost:9000/abc.txt") fileDstream.print()
Is there are way to get last n lines considering that text file is large in size and unsorted ?

Upvotes: 0

Views: 1194

Answers (1)

Manish Saraf Bhardwaj
Manish Saraf Bhardwaj

Reputation: 1058

If you do this, you could simplify to:

fileDstream.foreachRDD { rdd =>
      rdd.collect().last
    }

However, this has the problem of collecting all data to the driver.

Is your data sorted? If so, you could reverse the sort and take the first. Alternatively, a hackey implementation might involve a mapPartitionsWithIndex that returns an empty iterator for all partitions except for the last. For the last partition, you would filter all elements except for the last element in your iterator. This should leave one element, which is your last element.

OR you can also try with

fileDstream.foreachRDD { rdd =>
  rdd.top(10)(reverseOrdering)
}

Upvotes: 1

Related Questions