0x SLC
0x SLC

Reputation: 165

How does KStreams handle state store data when adding additional partitions?

I have one partition of data with one app instance and one local state store. It's been running for a time and has lots of stateful data. I need to update that to 5 partitions with 5 app instances. What happens to the one local state store when the partitions are added and the app is brought back online? Do I have to delete the local state store and start over? Will the state store be shuffled across the additional app instance state stores automatically according to the partitioning strategy?

Upvotes: 0

Views: 738

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62285

Do I have to delete the local state store and start over?

That is the recommended way to handle it. (cf https://docs.confluent.io/platform/current/streams/developer-guide/app-reset-tool.html) As a matter of fact, if you change the number of input topic partitions and restart your application, Kafka Stream would fail with an error, because the state store has only one shard, while 5 shards would be expected given that you will have 5 input topic partitions now.

Will the state store be shuffled across the additional app instance state stores automatically according to the partitioning strategy?

No. Also note, that this also applies to your data in your input topic. Thus, if you plan to partition your input data by key (ie, when writing into the input topic upstream), old records would remain in the existing partition and thus would not be partitioned properly.

In general, it is recommended to over-partitions your input topics upfront, to avoid that you need to change the number of partitions later on. Thus, you might also consider to maybe go up to 10, or even 20 partitions instead of just 5.

Upvotes: 1

Related Questions