Reputation: 647
I have a state store (using Materialized.as()
) in Kafka streams word-count application.
Based on my understanding the state-store is maintained in Kafka internal topic.
Please post along with the sources for solution if available.
Upvotes: 1
Views: 2920
Reputation: 15067
Can state-stores have unlimited key-value pairs, or they are governed by the rules of kafka topics based on the log.retention policies or log.segment.bytes?
Yes, state stores can have unlimited key-value pairs = events (or 'messages'). Well, local app storage space and remote storage space in Kafka permitting, of course (the latter for durably storing the data in your state stores).
Your application's state stores are persisted remotely in compacted internal Kafka topics. Compaction means that Kafka periodically purges older events for the same event key (e.g., Bob's old account balances) from storage. But compacted topics do not remove the most recent event per event key (e.g., Bob's current account balance). There is no upper limit for how many such 'unique' key-value pairs will be stored in a compacted topic.
I set the log.retention.ms=60000 and expected the state store value to be reset to 0 after a minute. But I find that it is not happening, I can still see values from state store.
log.retention.ms
is not used when a topic is configured to be compacted (log.cleanup.policy=compact
). See the existing SO question Log compaction to keep exactly one message per key for details, including for why compaction doesn't happen immediately (in short that's because compaction operates on partition segment files, it will not touch the most current segment file, and there can be multiple events per event key in that file).
Note: You can nowadays set the configuration log.cleanup.policy
to a combination of compaction and time/volume-based retention with log.cleanup.policy=compact,delete
(see KIP-71 for details). But generally you shouldn't fiddle with this setting unless you really know what you are doing -- the defaults are what you need 99% of the time.
Does kafka completely wipe out the logs or keeps the SNAPSHOT in case log-compaction topic? What does it mean by "segment gets committed"?
I don't understand this question, unfortunately. :-) Perhaps my previous answers and reference links already cover your needs. What I can say is that, no, Kafka doesn't wipe out the logs completely. Compaction operates on a topic-partition's segment files. You will probably need to read up on how compaction works, for which I would suggest an article like https://medium.com/@sunny_81705/kafka-log-retention-and-cleanup-policies-c8d9cb7e09f8, in case the Apache Kafka docs weren't sufficiently clear yet.
Upvotes: 6
Reputation: 191743
State stores are maintained by compacted, internal topics. They therefore follow the same semantics of compacted topics, and have to finite retention duration
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management
Upvotes: 1