Reputation: 127
Spark streaming dstream print() displays first 10 lines like
val fileDstream = ssc.textFileStream("hdfs://localhost:9000/abc.txt")
fileDstream.print()
Is there are way to get last n
lines considering that text file is large in size and unsorted ?
Upvotes: 0
Views: 1194
Reputation: 1058
If you do this, you could simplify to:
fileDstream.foreachRDD { rdd =>
rdd.collect().last
}
However, this has the problem of collecting all data to the driver.
Is your data sorted? If so, you could reverse the sort and take the first. Alternatively, a hackey implementation might involve a mapPartitionsWithIndex that returns an empty iterator for all partitions except for the last. For the last partition, you would filter all elements except for the last element in your iterator. This should leave one element, which is your last element.
OR you can also try with
fileDstream.foreachRDD { rdd =>
rdd.top(10)(reverseOrdering)
}
Upvotes: 1