Jake Ng
Jake Ng

Reputation: 11

Is it possible to have a window per sub-task/partition

I am working with Flink using data from a Kafka topic that has multiple partitions. Is it possible to have a window on each parallel sub-task/partition without having to use keyBy (as I want to avoid the shuffle). Based on the documentation, I can only choose between keyed windows (which requires a shuffle) or global windows (which reduces parallelism to 1).

The motivation is that I want to use a CountWindow to batch the messages with a custom trigger that also fires after a set amount of processing time. So per Kafka partition, I want to batch N records together or wait X amount of processing time before sending the batch downstream.

Thanks!

Upvotes: 1

Views: 283

Answers (1)

David Anderson
David Anderson

Reputation: 43707

There's no good way to do that.

One workaround would be to implement the batching and timeout logic in a custom sink. You'd want to implement the CheckpointedFunction interface to make your solution fault tolerant, and you could use the Sink.ProcessingTimeService.ProcessingTimeCallback interface for the timeouts.

UPDATE:

Just thought of another solution, similar to the one in your comment below. You could implement a custom source that sends a periodic heartbeat, and broadcast that to a BroadcastProcessFunction.

Upvotes: 1

Related Questions