PatPanda
PatPanda

Reputation: 5020

Reactor Kafka - Bottleneck if the number of consumer is greater than the number of partition

Is the number of partitions is a bottleneck to performance if the number of kafka consumer is (much) greater than the number of partitions?

Let's say I have a topic named the-topic with only three partitions.

Now, I have this below app in order to consume from the topic:

@Service
public class MyConsumer implements CommandLineRunner {

    @Autowired
    private KafkaReceiver<String, String> kafkaReceiver;


    @Override
    public void run(String... args) {
        myConsumer().subscribe();
    }

    public Flux<String> myConsumer() {
        return kafkaReceiver.receive()
                .flatMap(oneMessage -> consume(oneMessage))
                .doOnNext(abc -> System.out.println("successfully consumed {}={}" + abc))
                .doOnError(throwable -> System.out.println("something bad happened while consuming : {}" + throwable.getMessage()));
    }

    private Mono<String> consume(ConsumerRecord<String, String> oneMessage) {
        // this first line is a heavy in memory computation which transforms the incoming message to a data to be saved.
        // it is very intensive computation, but has been tested NON BLOCKING by different tools, and takes 1 second :D
        String transformedStringCPUIntensiveNonButNonBLocking = transformDataNonBlockingWithIntensiveOperation(oneMessage);
        //then, just saved the correct transformed data into any REACTIVE repository :)
        return myReactiveRepository.save(transformedStringCPUIntensiveNonButNonBLocking);
    }

}

I dockerized the app and deploy in Kubernetes.

With cloud providers, I am able to easily deploy 60 of those containers, 60 of those apps.

And suppose for the sake of this question, each of my app are super resilient, never crashes.

Does it mean, since the topic has only three partitions, that at any time, 57 other consumers will be wasted?

How to benefit from scaling up the number of containers when the number of partitions is low?

Upvotes: 0

Views: 453

Answers (2)

Aritra Sur Roy
Aritra Sur Roy

Reputation: 322

Does it mean, since the topic has only three partitions, that at any time, 57 other consumers will be wasted?

Yes. For a single consumer group you can have as many concurrent consumers as the number of partitions.

How to benefit from scaling up the number of containers when the number of partitions is low please?

You might try to register each of those containers in different consumer groups of number of partitions. That will work. But as mentioned by @OneCricketeer you should have a separate event processing pipeline. That would be the best approach if you don't want to process the same event multiple times.

Upvotes: 0

OneCricketeer
OneCricketeer

Reputation: 191728

since the topic has only three partitions, that at any time, 57 other consumers will be wasted?

Yes. That's how Kafka consumer API works. The framework you use around it isn't relevant.

benefits from scaling up the number of containers when the number of partitions is low

You need to separate the event processing (saving to a reposistory) from the actual consumption / poll loop. For example, push transformed events onto a non-blocking, external queue / external API, without waiting for a response. Then setup an autoscaler on that API endpoint.

Upvotes: 1

Related Questions