akelleci
akelleci

Reputation: 5

Kafka Streams RocksDB large state

Is it okay to hold large state in RocksDB when using Kafka Streams? We are planning to use RocksDB as an eventstore to hold billions of events for ininite of time.

Upvotes: 0

Views: 781

Answers (2)

cmcnealy
cmcnealy

Reputation: 362

Yes, you can store a lot of state there but there are some considerations:

  • The entire state will also be replicated on the changelog topics, which means your broker will need to have enough disk space for it. Note that this will NOT be mitigated by KIP-405 (Tiered Storage) as tiered storage does not apply for compacted topics.
  • As @OneCricketeer mentioned, rebuilding the state can take a long time if there's a crash. However, you can mitigate it via multiple ways:
    • Use a persistent store and re-start the application on a node with access to the same disk (StatefulSet + PersistentVolume in K8s works).
      • In exactly-once semantics, until KIP-844 is implemented upon an unclean shutdown the state will still be rebuilt from scratch. But once that PR is merged then only a small amount of data will have to be replayed.
    • Have standby replicas. They will enable failover as soon as the consumer session timeout expires once the kafka streams instance crashes.

Upvotes: 2

OneCricketeer
OneCricketeer

Reputation: 191728

The main limitation would be disk space, so sure, it can be done, but if the app crashes for any reason, you might be waiting for a while for the app to rebuild its state.

Upvotes: 1

Related Questions