Reputation: 9357

Can we lose messages in Kafka Streams if we add new partitions?

Say for example, I have 4 partitions. When a message msg1 of key 101 is put into partition 1 (out of 4) and is not consumed yet. Meanwhile a new partition is added making total of 5 partitions.

Then the next message msg2 of key 101, goes to 4th partition (say for example) because the hash(101)%no_of_partitions=4.

Now, in the streams API, whenever a message is consumed by its key, the partition 4 will be accessed for the key because that is the partition it gets when it computes the hash(101)%no_of_partitions and therefore it gets the msg2 of key 101 in partition 4.

Now, what about the msg1 of key 101 in partition 1? Is it consumed at all?

Upvotes: 1

Answers (3)

zashto

Reputation: 13

It will be consumed, but order is not guaranteed. Be sure that application logic is idempotent. One possible solution is to go through intermediate topic with more partitions. KStream#through will help you to produce and to consume with a single instruction. The method does exactly the same thing and returns a KStream. In pseudo code:

   .stream(...)

   // potential key transformation
   .through("inner_topic_with_more_partitions")
   .toTable(accountMateriazer)

Upvotes: 0

Matthias J. Sax

Reputation: 62350

You won't loose data, however, depending on your application, adding partitions might not supported and will break your application.

You can add partitions only, if you application is stateless. If your application is stateful, your application will most likely break and die with an exception.

Also note, that Kafka Streams assumes, that input data is partitioned by key. Thus, if the partitioning is changed, even if the application does not break, it will most likely compute an incorrect result, because adding a partition violated the partitioning assumption.

One way to approach this issue is, to reset your application (cf ). However, this implies that you loose your current application state. Note, that resetting will not address the problem about incorrect partitioning though and your application might compute incorrect results. To guard agains the partitioning problem, you could insert a dummy map() operation that only forward the data after you read data from a topic, because this will result in data repartitioning if required and thus fix the key-based partitioning.

Upvotes: 3

Val Bonn

Reputation: 1199

The msg1 of key 101 in partition 1 will be consumed.

In Kafka Streams, you do not "consume a message by its key". Every message in every partition will be consumed. If someone should filter on the key, it would be in the code of the Kafka Stream App.

Upvotes: 0

Can we lose messages in Kafka Streams if we add new partitions?

Answers (3)

Related Questions