Scaling Kafka for Microservices

Question

Problem

I want to understand how I need to design the message passing for microservices to be still elastic and scalable.

Goal

Microservices allow an elastic number of instances which are scaled up and down automatically based on the current load. This characteristic should not be limited by Kafka.
We need to guarantee an at-least-once delivery
We need to guarantee the delivery order of events concerning the same entity

Running example

For simplicitly, let's say there are 2 microservices A and B;
A1, A2 being instances of microservices A; B1 and B2 instances of microservices B
A1 and A2 publish events describing CRUD operations on entities of A, like entity was created, updated, deleted.

Design I

A1 publishes events under topic a.entity.created including information like the id of the entity that was created in the message body.

A1 has to specify how many partitions (p) there will be in order to allow parallel consumption of consumers. This will allow for scalability.

B1 and B2 subscirbe to the topic a.entity.created as a consumer group b-consumers. This leads to a load distribution between instances of B.

Questions:

a) Events concerning the same entity might be processed in parallel and might get out of order, right?

b) This will lead to instances of B dealing with requests in parallel. The limitting factor is how p (amount of partitions) is defined by the producers. If there are 5 partitions, but I'd need 8 consumers to cope with the load, it won't work since 3 consumers will not receive events. Did I understand this correctly? This would IMO be unusable for elastic microservices which might want to scale further.

Design II

A1 publishes events under topic a.entity.created.{entityId} leading to a lot of different topics.

Each partition size is set to 1.

B1 and B2 subscribe to the topic a.entity.created.* with a wildcard as a consumer group b-consumers. This leads to a load distribution between instances of B.

Questions:

c) Events concering the same entity should be guaranteed to be delivered in order since there is only one partition, right?

d) How is scalability handled here? The amount of partitions shouldn't limit the amount of consumers, or does it?

~~e) Is there a better way to guarantee the goals mentioned above?~~

Design III (thx to StuartLC)

A1 publishes event under topic a.entity.created and partition key based on the entityId.
Each partition size is set to 10.
B1 and B2 subscirbe to the topic a.entity.created as a consumer group b-consumers. This leads to a load distribution between instances of B. Events regarding the same entity will be delivered in order thanks to the partition key.

Questions:

f) With p=10 I can have max 10 consumers. This means I have to estimate the number of consumers at design time / deploy time if using env variables. Can I move that to runtime anyhow so that is adjusted dynamically?

Scaling Kafka for Microservices

Problem

Goal

Running example

Design I

Design II

Design III (thx to StuartLC)

Answers (1)

Related Questions