Reputation: 3164
Say we have 3 kafka partitions for one topic, and I want my events to be windowed by the hour, using event time.
Will the kafka consumer stop reading from a partition when it is outside of the current window? Or does it open a new window? If it is opening new windows then wouldn't it be theoretically possible to have it open a unlimited amount of windows and thus run out of memory, if one partition's event time would be very skewed compared to the others? This scenario would especially be possible when we are replaying some history.
I have been trying to get this answer from reading documentation, but can not find much about the internals of Flink with Kafka on partitions. Some good documentation on this specific topic would be very welcome.
Thanks!
Upvotes: 1
Views: 1782
Reputation: 3422
So first of all events from Kafka are read constantly and the further windowing operations have no impact on that. There are more things to consider when talking about running out-of-memory.
Some more on how Kafka consumer interacts with EventTime (watermarks in particular you can check here
Upvotes: 1
Reputation: 452
You could try to use this type of style
public void runStartFromLatestOffsets() throws Exception {
// 50 records written to each of 3 partitions before launching a latest-starting consuming job
final int parallelism = 3;
final int recordsInEachPartition = 50;
// each partition will be written an extra 200 records
final int extraRecordsInEachPartition = 200;
// all already existing data in the topic, before the consuming topology has started, should be ignored
final String topicName = writeSequence("testStartFromLatestOffsetsTopic", recordsInEachPartition, parallelism, 1);
// the committed offsets should be ignored
KafkaTestEnvironment.KafkaOffsetHandler kafkaOffsetHandler = kafkaServer.createOffsetHandler();
kafkaOffsetHandler.setCommittedOffset(topicName, 0, 23);
kafkaOffsetHandler.setCommittedOffset(topicName, 1, 31);
kafkaOffsetHandler.setCommittedOffset(topicName, 2, 43);
Upvotes: 0