Abhishek
Abhishek

Reputation: 637

Kafka persistent statestore vs in memory state store

We are facing an issue where we are using kafka persistent state store and its often runs out of space (8gb) so we are thinking of moving to in memory state store

Stores.persistenKeyValueStore("name");

To

Stores.inMemoryKeyValueStore("name");

Just have few question if we change to in memory

  1. Do we loose any data in case of broker/consumer restart?
  2. How does consumer get the previous data in case the old data from memory is flushed, does it get that data from broker?
  3. If it may fetch the data from broker is it not a performance impact as its a network call rather than fetching the data locally in case of persistent state store.

Is there any other disadvantage of switching to in memory.

Please note that we have streaming applications (KTable) and have around 2M unique messages.

Size of each msg would be around 2kb Frequency 500msg/sec avg

Upvotes: 5

Views: 6103

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62285

runs out of space (8gb) so we are thinking of moving to in memory state store

Seems that switching to in-memory stores would be a step backwards? 8GB is also rather smaller -- why do you have such small disks?

Do we loose any data in case of broker/consumer restart?

No. Persistent stores are just an optimization for increase startup times and the ability to hold larger state (as they can spill to disk). Both, persistent and in-memory stores are backed by a changelog topic in the Kafka cluster for fault-tolerance. For proper fault-tolerance you need to apply the same config on Kafka Streams as well as the changelog topic independent of the store type.

How does consumer get the previous data in case the old data from memory is flushed, does it get that data from broker?

If you use in-memory store, the client always holds a full copy of the data set. Hence, you data set must fit into main-memory. The write to the Kafka cluster are for fault-tolerance only. During normal operations, Kafka Streams only writes into the changelog topics. Changelog topics are only read if a task is migrated and the store needs to be rebuild.

Is there any other disadvantage of switching to in memory.

As mentioned, the disadvantages are: - you loose the local state of rolling restart and state needs to be recovered from the changelog topic increasing startup time - your state must fit into main-memory

Upvotes: 5

Related Questions