Reputation: 127
Can we have multiple consumers to consume from a topic to achieve parallel processing in kafka. My use case is to read messages from a single partition in parallel.
Upvotes: 2
Views: 2881
Reputation: 31
There can be cases when it is required to parallelise processing beyond the number of partitions.
One possible implementation can be what is one of the answer here using Akka Streams Kafka (Reactive kafka)
Easier solution would be to use the parallel-consumer library
It supports partial failures as well and supports batching which makes it very powerful for a number of use cases.
Upvotes: 0
Reputation: 13585
You need multiple partitions to do this, or something like Parallel Consumer (PC) to sub divide the single partition.
However, it's recommended to have at least 3 partitions and have at least three consumers running in a group, to utilise high availability. You can again use PC to process all these partitions, sub divided by key, in parallel.
PC directly solves for this, by sub partitioning the input partitions by key and processing each key in parallel. It also tracks per record acknowledgement. Check out Parallel Consumer on GitHub (it's open source BTW, and I'm the author).
Upvotes: 0
Reputation: 4448
Yes you can process messages in parallel using many Kafka consumers, but no, it's not possible if you only have one partition.
Parallelism in Kafka consuming is defined by the number of partitions, you can easily re-partition your topic at any time to create more partitions.
An example of how process messages in parallel using rapids-kafka-client below, a library to make Kafka parallel consuming easier.
public static void main(String[] args){
ConsumerConfig.<String, String>builder()
.prop(KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName())
.prop(VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName())
.prop(GROUP_ID_CONFIG, "stocks")
.topics("stock_changed")
.consumers(7)
.callback((ctx, record) -> {
System.out.printf("status=consumed, value=%s%n", record.value());
})
.build()
.consume()
.waitFor();
}
Upvotes: 1
Reputation: 2921
The number of partitions define the level of parallelism to read from a kafka topic. But reading is (more or less) only restricted by your network capacities.
A good pattern is to separate reading and processing of messages (one thread per topic-partition for reading and multiple threads for processing this messages).
Upvotes: 0
Reputation: 10282
Simply saying we can't achieve partition level parallelism for Consumers by default.
But you can try Akka Streams Kafka (Reactive kafka). Once go through thsese docs.
Upvotes: 0