Modelling a Kafka cluster

Question

I have an API endpoint that accepts events with a specific user ID and some other data. I want those events broadcasted to some external locations and I wanted to explore using Kafka as a solution for that.

I have the following requirements:

Events with the same UserID should be delivered in order to the external locations.
Events should be persisted.
If a single external location is failing, that shouldn't delay delivery to other locations.

Initially, from some reading I did, it felt like I want to have N consumers where N is the number of external locations I want to broadcast to. That should fulfill requirement (3). I also probably want one producer, my API, that will push events to my Kafka cluster. Requirement (2) should come in automatically with Kafka.

I was more confused regarding how to model the internal Kafka cluster side of things. Again, from the reading I did, it sounds like it's a bad practice to have millions of topics, so having a single topic for each userID is not an option. The other option I read about is having one partition for each userID (let's say M partitions). That would allow requirement (1) to happen out of the box, if I understand correctly. But that would also mean I have M brokers, is that correct? That also sounds unreasonable.

What would be the best way to fulfill all requirements? As a start, I plan on hosting this with a local Kafka cluster.

OneCricketeer · Accepted Answer

You are correct that one topic per user is not ideal.

Partition count is not dependent upon broker count, so this is a better design.

If a single external location is failing, that shouldn't delay delivery to other locations.

This is standard consumer-group behavior, not topic/partition design.

Modelling a Kafka cluster

Answers (1)

Related Questions