kadsank
kadsank

Reputation: 311

Flink: Watermarking with Late Elements

I am doing real-time streaming in Flink where the Kafka is the message queue. I am applying EventTimeSlidingWindow of 120 sec. and slide of 1 sec. I am also inserting the watermark at each second of Event Time.

My concern is what happened if the element will come late, after the watermark? Now I my case, Flink simply discard the message which come after its respective watermark. Is there any mechanism provided by the filnk to handle such late message, like maintaining separate window? I have also gone through the documentation but I did not get clear about it.

Upvotes: 4

Views: 2344

Answers (3)

Tim Mark
Tim Mark

Reputation: 1

Allowed lateness can result in multiple outputs. So end of window and end of watermark from the last even is one time and then for each element that’s late another aggregated output.

Upvotes: 0

hanif s
hanif s

Reputation: 488

By default, late elements are dropped when the watermark is past the end of the window. However, Flink allows to specify a maximum allowed lateness for window operators. Allowed lateness specifies by how much time elements can be late before they are dropped, and its default value is 0. Elements that arrive after the watermark has passed the end of the window but before it passes the end of the window plus the allowed lateness, are still added to the window. Depending on the trigger used, a late but not dropped element may cause the window to fire again. This is the case for the EventTimeTrigger.

In order to make this work, Flink keeps the state of windows until their allowed lateness expires. Once this happens, Flink removes the window and deletes its state.

Also another option is SideOoutput i.e. In addition to the main stream that results from DataStream operations, you can also produce any number of additional side output result streams. The type of data in the result streams does not have to match the type of data in the main stream and the types of the different side outputs can also differ. This operation can be useful when you want to split a stream of data where you would normally have to replicate the stream and then filter out from each stream the data that you don’t want to have.

When using side outputs, you first need to define an OutputTag that will be used to identify a side output stream:

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html

Upvotes: 4

Robert Metzger
Robert Metzger

Reputation: 4542

Apache Flink has a concept called allowed lateness for the windows to handle data that arrives after a watermark.

Upvotes: 6

Related Questions