Reputation: 5020
Is the number of partitions is a bottleneck to performance if the number of kafka consumer is (much) greater than the number of partitions?
Let's say I have a topic named the-topic with only three partitions.
Now, I have this below app in order to consume from the topic:
@Service
public class MyConsumer implements CommandLineRunner {
@Autowired
private KafkaReceiver<String, String> kafkaReceiver;
@Override
public void run(String... args) {
myConsumer().subscribe();
}
public Flux<String> myConsumer() {
return kafkaReceiver.receive()
.flatMap(oneMessage -> consume(oneMessage))
.doOnNext(abc -> System.out.println("successfully consumed {}={}" + abc))
.doOnError(throwable -> System.out.println("something bad happened while consuming : {}" + throwable.getMessage()));
}
private Mono<String> consume(ConsumerRecord<String, String> oneMessage) {
// this first line is a heavy in memory computation which transforms the incoming message to a data to be saved.
// it is very intensive computation, but has been tested NON BLOCKING by different tools, and takes 1 second :D
String transformedStringCPUIntensiveNonButNonBLocking = transformDataNonBlockingWithIntensiveOperation(oneMessage);
//then, just saved the correct transformed data into any REACTIVE repository :)
return myReactiveRepository.save(transformedStringCPUIntensiveNonButNonBLocking);
}
}
I dockerized the app and deploy in Kubernetes.
With cloud providers, I am able to easily deploy 60 of those containers, 60 of those apps.
And suppose for the sake of this question, each of my app are super resilient, never crashes.
Does it mean, since the topic has only three partitions, that at any time, 57 other consumers will be wasted?
How to benefit from scaling up the number of containers when the number of partitions is low?
Upvotes: 0
Views: 453
Reputation: 322
Does it mean, since the topic has only three partitions, that at any time, 57 other consumers will be wasted?
Yes. For a single consumer group you can have as many concurrent consumers as the number of partitions.
How to benefit from scaling up the number of containers when the number of partitions is low please?
You might try to register each of those containers in different consumer groups of number of partitions. That will work. But as mentioned by @OneCricketeer you should have a separate event processing pipeline. That would be the best approach if you don't want to process the same event multiple times.
Upvotes: 0
Reputation: 191728
since the topic has only three partitions, that at any time, 57 other consumers will be wasted?
Yes. That's how Kafka consumer API works. The framework you use around it isn't relevant.
benefits from scaling up the number of containers when the number of partitions is low
You need to separate the event processing (saving to a reposistory) from the actual consumption / poll loop. For example, push transformed events onto a non-blocking, external queue / external API, without waiting for a response. Then setup an autoscaler on that API endpoint.
Upvotes: 1