Reputation: 161
This probably has multiple questions so bear with me. I am still figuring out the right way to use the Kafka Architecture. I know that the partitions of a topic are divided b/w the consumers.
What exactly are consumers? Right now, I am thinking of writing a daemon python process that acts as a consumer. When the consumer consumes a message from Kafka, there is a task that I have to complete. This is a huge task so I am creating sub-tasks that run concurrently. Can I have multiple consumers(python scripts) on the same machine?
I have multiple microservices that I am working on, so each microservice has its own consumer?
When the load increases I have to scale the consumers. I thought of spawning a new machine that has acts as another consumer. But I just feel that I am doing something wrong here and feel that there has to be a better way.
Can you tell me how you scaled your consumers based on the load? Do I have to increase my partitions in topics if I need to increase my consumers? How do I do it dynamically? Can I decrease the partitions when there are fewer messages produced? How many partitions are ideal initially?
And please suggest some good practices to follow.
This is the consumer script that I am using
while True:
message = client.poll(timeout=10)#client is the KafkaConsumer object
if message is not None:
if message.error():
raise KafkaException(message.error())
else:
logger.info('recieved topic {topic} partition {partition} offset {offset} key {key} - {value}'.format(
topic=message.topic(),
partition=message.partition(),
offset=message.offset(),
key=message.key(),
value=message.value()
))
#run task
Upvotes: 2
Views: 3977
Reputation: 191874
Can I have multiple consumers(python scripts) on the same machine?
Yes. You can also have Python threads, though.
If you're not consuming multiple topics, then there is no need for multiple consumers.
What exactly are consumers?
Feel free to read over the Apache Kafka site...
each microservice has its own consumer?
Is each service running similar code? Then yes.
I thought of spawning a new machine
Spawn new instances of your app on one machine. Monitor CPU and Mem and Network load. Don't get new machines until at least one of those is above say 70% under normal processing.
Do I have to increase my partitions in topics if I need to increase my consumers?
In general, yes. The number of consumers in a consumer group is limited by the number of partitions in the subscribed topics.
Can I decrease the partitions when there are fewer messages produced?
No. Partitions cannot be decreased
When the load increases I have to scale the consumers
Not necessarily. Is the increased load constantly rising, or are there waves of it? If variable, then you can let Kafka buffer the messages. And the consumer will keep polling and processing as fast as it can.
You need to define your SLAs for how long a message will take to process after reaching a topic from a producer.
How many partitions are ideal initially?
There are multiple articles on this, and it depends specifically on your own hardware and application requirements. Simply logging each message, you could have thousands of partitions...
When the consumer consumes a message from Kafka, there is a task that I have to complete
Sounds like you might want to look at Celery, not necessarily just Kafka. You could also look at Faust for Kafka processing
Upvotes: 2