Reputation: 3173
I am creating a 30-minute de-duplication store for a Kafka Streams application loosely based upon this confluent code (to solve a different problem to Kafka's exactly-once processing guarantee), and want to minimise topology startup time.
This code makes use of a persistent window store, which requires that I specify the number of log segments to make use of. Assuming I want to use 2 segments, and am using the default segment size of 1GB, does this mean that during rebalancing, the client will have to read 2GB of data before the application launches?
Upvotes: 1
Views: 334
Reputation: 62330
The segment parameter configures something different in Kafka Streams -- it's not related to segments in the brokers (just the same name).
Using a windowed store, the retention time of the store, is divided by the number of segments. If all data is a segment is older than the retention time, the complete segment is dropped and a new empty segment is created. Those segments, only exist client-side.
The number of record that need to be restored, only depend on the retention time (and your input data rate). It's independent of segments size. Segment size only defined how fine grained older records are expired.
Upvotes: 3