djo
djo

Reputation: 149

Sort WordCount Output Flink

I am trying to learn Flink and I am doing the basic WordCount tutorial. I was wondering how I could sort the output of a datastream so that it outputs the counts in descending order. I don't need this saved as a text file just output to the console in descending order.

The following is within the main function

DataStream<String> text = env.readTextFile(<PATH TO TEXT>)
DataStream<Tuple2<String, Integer>> counts = 
     text.flatMap(new Tokenizer())
     .keyBy(0)
     .sum(1);

counts.print();

Right now this writes all the counts with no issues I would like to simply have the counts sorted in descending order (by the value of the count). I was trying to get this to work with .addSink() but I do not understand how to sort with this.

Inside the main function

counts.addSink(new CustomSink());

Outside the main function

public static final class CustomSink implements SinkFunction<Tuple2<String, Integer>> {
     public void invoke(Integer value) throws Exeception {
     }
}

Upvotes: 0

Views: 436

Answers (1)

David Anderson
David Anderson

Reputation: 43717

  • Sorting by anything other than timestamps is fundamentally incompatible with unbounded streaming.

  • Sorting over bounded streams can easily be done with Flink's SQL/Table API. There isn't a good way to do this with the DataStream API.

Upvotes: 3

Related Questions