Reputation: 147
I am a little confused on the punctuated vs periodic watermarking in Apache Flink.
Suppose I have a DataStream with incoming data and the timestamp field of the POJO
that the datastream type is of is always in ascending order.
So it would be something like [{id: 1, ts: 12}, {id: 2, ts: 13}, ... , {id: 5, ts: 233445}]
I am wondering which type of watermark assigner to use in this case. Should I be using the AscendingTimestampExtractor..
or create a custom punctuated
one?
Upvotes: 1
Views: 278
Reputation: 43632
I've never encountered a situation in which punctuated watermarking is called for in production, but I sometimes do use it when I'm experimenting and want explicit control over when the watermarks will be inserted into the stream. For example, putting a watermark after every event is a bad idea because of the overhead involved, but is easily done with punctuated watermarks and this makes it easy to cause timers to fire at specific points in the stream. In production I wouldn't do this, and while it is reasonable to want this level of control in tests, there are better ways to do it (e.g., using some sort of test harness).
The design intent behind punctuated watermarks is for cases where there are special events in the stream that are meant to be used as signals for watermarking. E.g., an upstream job may have already watermarked the stream and written it out to Kafka with watermarks included, or some of the events come from a device with a trusted clock, and others do not.
Upvotes: 2