balaji
balaji

Reputation: 93

Apache Flink: How to process all the events that are behind a watermark?

I want to use Flink's event timestamp and plan to implement a simple emitWatermark which is System.currentTimeInMillis - 10 secs. My understanding is tumbling window will fire start_time + window_interval + 10 secs. So if events arrives later than the watermark those events will be dropped.

Is there a way to write all the dropped events by Flink to a sink like S3?

Upvotes: 2

Views: 1418

Answers (1)

Ruslan Kovalov
Ruslan Kovalov

Reputation: 46

It should be achievable with Side Outputs. The documentation of the sideOutputLateData operator states the following:

Send late arriving data to the side output identified by the given {@link OutputTag}. Data is considered late after the watermark has passed the end of the window plus the allowed lateness set using {@link #allowedLateness(Time)}.

So then you can get the late data stream by the output tag and sink it to s3.

Upvotes: 3

Related Questions