user1258683
user1258683

Reputation: 59

Kafka message partitioning by key

We have a business process/workflow that is being started when initial event message is received and closed when the last message is processed. We have up to 100,000 processes executed each day. My problem is that the order of the messages that come to specific process has to be processed by the same order messages were received. If one of the messages fails, the process has to freeze until the problem is fixed, despite that all other processes has to continue. For this kind of situation i am thinking of using Kafka. first solution that came to my mind was to use Topic partitioning by message key. The key of the message would be the ProcessId. This way i could be sure that all process messages would be partitioned and kafka would guarantee the order. As i am new to Kafka what i managed to figure out that partitions has to be created in advance and that makes everything to difficult. so my questions are:

1) when i produce message to kafka's topic that does not exist, the topic is created on runtime. Is it possible to have same behavior for topic partitions? 2) there can be more than 100,000 active partitions on the topic, is that a problem? 3) can partition be deleted after all messages from that topic were read? 4) maybe you can suggest other approaches to my problem?

Upvotes: 2

Views: 10018

Answers (2)

ADS
ADS

Reputation: 718

I suppose you choose wrong feature to solve you task.

  • In general, partitioning is used for load balancing.
  • Incoming messages will be distributed on given number of partition according to the partitioning strategy which defined at broker start. In short, default strategy just calculate i=key_hash mod number_of_partitions and put message to ith partition. More about strategies you could read here
  • Message ordering is guaranteed only within partition. With two messages from different partitions you have no guarantees which come first to the consumer.

Probably you would use group instead. It's option for consumer

  • Each group consumes all messages from topic independently.
  • Group could consist of one consumer or more if you need it.
  • You could assign many groups and add new group (in fact, add new consumer with new groupId) dynamically.
  • As you could stop/pause any consumer, you could manually stop all consumers related to specified group. I suppose there is no single command to do that but I'm not sure. Anyway, if you have single consumer in each group you could stop it easily.
  • If you want to remove the group you just shutdown and drop out related consumers. No actions on broker side is needed.

As a drawback you'll get 100,000 consumers which read (single) topic. It's heavy network load at least.

Upvotes: 2

mukesh210
mukesh210

Reputation: 2892

When i produce message to kafka's topic that does not exist, the topic is created on runtime. Is it possible to have same behavior for topic partitions?

You need to specify number of partitions while creating topic. New Partitions won't be create automatically(as is the case with topic creation), you have to change number of partitions using topic tool.

More Info: https://kafka.apache.org/documentation/#basic_ops_modify_topi

As soon as you increase number of partitions, producer and consumer will be notified of new paritions, thereby leading them to rebalance. Once rebalanced, producer and consumer will start producing and consuming from new partition.

there can be more than 100,000 active partitions on the topic, is that a problem?

Yes, having this much partitions will increase overall latency. Go through how-choose-number-topics-partitions-kafka-cluster on how to decide number of partitions.

can partition be deleted after all messages from that topic were read?

Deleting a partition would lead to data loss and also the remaining data's keys would not be distributed correctly so new messages would not get directed to the same partitions as old existing messages with the same key. That's why Kafka does not support decreasing partition count on topic.

Also, Kafka doc states that

Kafka does not currently support reducing the number of partitions for a topic.

Upvotes: 3

Related Questions