Reputation: 21
Using Apache Flink I want to create a streaming window sorted by the timestamp that is stored in the Kafka event. According to the following article this is not implemented.
https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams
However, the article is dated july 2015, it is now almost a year later. Is this functionality implemented and can somebody point me to any relevent documentation and/or an example.
Upvotes: 2
Views: 1887
Reputation: 18987
Apache Flink supports stream windows based on event timestamps. In Flink, this concept is called event-time.
In order to support event-time, you have to extract a timestamp (long value) from each event. In addition, you need to support so-called watermarks which are needed to deal with events with out-of-order timestamps.
Given a stream with extracted timestamps you can define a windowed sum as follows:
val stream: DataStream[(String, Int)] = ...
val windowCnt = stream
.keyBy(0) // partition stream on first field (String)
.timeWindow(Time.minutes(1)) // window in extracted timestamp by 1 minute
.sum(1) // sum the second field (Int)
Event-time and windows are explained in detail in the documentation (here and here) and in several blog posts (here, here, here, and here).
Upvotes: 2
Reputation: 986
Sorting by timestamps is still not supported out-of-box but you can do windowing based on the timestamps in elements. We call this event-time windowing. Please have a look here: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html.
Upvotes: 0