Jean-Louis Lamezec
Jean-Louis Lamezec

Reputation: 21

How flink deal with lateness when timestamp is assigned at source?

What I understand so far is that there 3 ways of dealing with late data in Flink :

I don't quite well understand what happen for late event for non-window operator, especially when the timestamp is assigned at the source. Here I have a FlinkKafkaConsumer :

new FlinkKafkaConsumer(
          liveTopic,
          deserializer,
          config.toProps
        ).setStartFromTimestamp(startOffsetTimestamp)
          .assignTimestampsAndWatermarks(
            WatermarkStrategy
              .forBoundedOutOfOrderness[String](Duration.ofSeconds(20))
          )

If some data is out-of-order inside my Kafka partition, let's say 1 minute late in term of timestamp attached to a record, will this data be discarded when consumed by Flink ? Can I configure some kind of allowedLatness (like with window operator) ?

Upvotes: 1

Views: 842

Answers (1)

David Anderson
David Anderson

Reputation: 43717

The only operators that drop late events are those that must make a time-based decision about how to process each event. Thus, by default, event-time-based windows and CEP drop late events (CEP does so because it has to first do a time-based sort of the event stream, and late events have missed their chance to be sorted into place). In both of those cases, those APIs offer a stream of late events as a side output channel.

Flink SQL's temporal operators also drop late events. So far the Table/SQL doesn't offer any way to capture or accommodate those late events (without resorting to using the DataStream API).

But all other operators else simply operate on events without paying any attention to their lateness. And in a ProcessFunction you can examine the timestamp and compare it to the current watermark, and make your own decision about how to handle late events.

Upvotes: 2

Related Questions