Reputation: 149
I am trying to learn Flink and I am doing the basic WordCount tutorial. I was wondering how I could sort the output of a datastream so that it outputs the counts in descending order. I don't need this saved as a text file just output to the console in descending order.
The following is within the main function
DataStream<String> text = env.readTextFile(<PATH TO TEXT>)
DataStream<Tuple2<String, Integer>> counts =
text.flatMap(new Tokenizer())
.keyBy(0)
.sum(1);
counts.print();
Right now this writes all the counts with no issues I would like to simply have the counts sorted in descending order (by the value of the count). I was trying to get this to work with .addSink() but I do not understand how to sort with this.
Inside the main function
counts.addSink(new CustomSink());
Outside the main function
public static final class CustomSink implements SinkFunction<Tuple2<String, Integer>> {
public void invoke(Integer value) throws Exeception {
}
}
Upvotes: 0
Views: 436
Reputation: 43717
Sorting by anything other than timestamps is fundamentally incompatible with unbounded streaming.
Sorting over bounded streams can easily be done with Flink's SQL/Table API. There isn't a good way to do this with the DataStream API.
Upvotes: 3