David Anderson
David Anderson

Reputation: 43514

Does my Flink application need watermarks? If not, do I need WatermarkStrategy.noWatermarks?

I'm not sure if my Flink application actually requires Watermarks. When are they necessary?

And if I don't need them, what is the purpose of WatermarkStrategy.noWatermarks()?

Upvotes: 7

Views: 1528

Answers (1)

David Anderson
David Anderson

Reputation: 43514

A Watermark for time t marks a location in a data stream and asserts that the stream, at that point, is now complete up through time t.

The only purpose watermarks serve is to trigger the firing of event-time-based timers.

Event-time-based timers are directly exposed by the KeyedProcessFunction API, and are also used internally by

  • event-time Windows
  • the CEP (pattern-matching) library, which uses Watermarks to sort the incoming stream(s) if you specify you want to do event-time based processing
  • Flink SQL, again only when doing event-time-based processing: e.g., ORDER BY, versioned table joins, windows, MATCH_RECOGNIZE, etc.

Common cases where you don't need watermarks include applications that only rely on processing time, or when doing batch processing. Or when processing data that has timestamps, but never relying on event-time timers (e.g., simple event-by-event processing).

Flink's new source interface, introduced by FLIP-27, does require a WatermarkStrategy:

env.fromSource(source, watermarkStrategy, sourceName);

In cases where you don't actually need watermarks, you can use WatermarkStrategy.noWatermarks() in this interface.

Upvotes: 10

Related Questions