Reputation: 1254
I'm creating a Flink application that simply forwards windowed incoming Kafka events to another Kafka topic with an addition of start and end markers for each window - so for example, for a window of 1 hour containing 1, 2, 3, 4, 5
, I will sink
start_timestamp, 1, 2, 3, 4, 5, end_timestamp
into a different Kafka topic. Potentially, there will be some other transforms later on, but in general, for N events coming in, I'm always going to be emitting at least N+2 events.
As I understand, using windowAll()
with a ProcessAllWindowFunction
that will inject start and end markers should do this.
My question is around state management. I'll be using RocksDb state backend - will it also keeps persisting internal window state even for this non-keyed stream? My main concern is to be able to keep the state in the window so that I'm not reprocessing it again, especially for large windows.
Upvotes: 0
Views: 798
Reputation: 43707
I prefer @kkrugler's approach, since it will avoid the cost of keeping all that state around. But to answer your question, yes, a windowAll
can use the RocksDB state backend to persist its contents. Under the hood a windowAll
is actually a keyed window with a special constant key. So even though RocksDB can only be used to manage keyed state, it works.
Upvotes: 1
Reputation: 9265
For something this simple, I'd use a FlatMap
(with parallelism set to 1) that keeps in state the time of the current window and the last event time. Whenever a record arrives, if it's in a new hourly window, I'd emit the end_timestamp
(last event time), the start_timestamp
(from the new record), and update the saved state's current hour. In all case the last event time in state is also updated. This assumes your incoming events are strictly ordered, so you don't have to worry about late data.
Upvotes: 1