Reputation: 37
Currently I have one Kafka topic.
Now I need to run multiple consumer so that message can be read and processed in parallel.
Is this possible.
I am using python and pykafka library.
consumer = topic.get_simple_consumer(consumer_group=b"charlie",
auto_commit_enable=True)
Is taking same message in both consumer. I need to process message only once.
Upvotes: 2
Views: 1304
Reputation: 13585
Generally you need multiple partitions and multiple consumer to do this, or, something like Parallel Consumer (PC) to sub divide the single partition.
However, it's recommended to have at least 3 partitions and have at least three consumers running in a group, to utilise high availability. You can again use PC to process all these partitions, sub divided by key, in parallel.
PC directly solves for this, by sub partitioning the input partitions by key and processing each key in parallel. It also tracks per record acknowledgement. Check out Parallel Consumer on GitHub (it's open source BTW, and I'm the author).
Upvotes: 0
Reputation: 6207
You need to use BalancedConsumer
instead of SimpleConsumer
:
consumer = topic.get_balanced_consumer(consumer_group=b"charlie",
auto_commit_enable=True)
You should also ensure that the topic you're consuming has at least as many partitions as the number of consumers you're instantiating.
Upvotes: 2
Reputation: 1886
Yes you can have multiple consumers reading from the same topic in parallel provided you use the same consumer group id and the number of partitions of the topic should greater than the consumers otherwise some of the consumers will not be assigned any partitions and those consumers won't fetch any data
Upvotes: -1