How do I scale Kafka Consumers in python?

Question

This probably has multiple questions so bear with me. I am still figuring out the right way to use the Kafka Architecture. I know that the partitions of a topic are divided b/w the consumers.

What exactly are consumers? Right now, I am thinking of writing a daemon python process that acts as a consumer. When the consumer consumes a message from Kafka, there is a task that I have to complete. This is a huge task so I am creating sub-tasks that run concurrently. Can I have multiple consumers(python scripts) on the same machine?

I have multiple microservices that I am working on, so each microservice has its own consumer?

When the load increases I have to scale the consumers. I thought of spawning a new machine that has acts as another consumer. But I just feel that I am doing something wrong here and feel that there has to be a better way.

Can you tell me how you scaled your consumers based on the load? Do I have to increase my partitions in topics if I need to increase my consumers? How do I do it dynamically? Can I decrease the partitions when there are fewer messages produced? How many partitions are ideal initially?

And please suggest some good practices to follow.

This is the consumer script that I am using

while True:
    message = client.poll(timeout=10)#client is the KafkaConsumer object
    if message is not None:
        if message.error():
            raise KafkaException(message.error())
        else:
            logger.info('recieved topic {topic} partition {partition} offset {offset} key {key} - {value}'.format(
                topic=message.topic(),
                partition=message.partition(),
                offset=message.offset(),
                key=message.key(),
                value=message.value()
            ))
            #run task

How do I scale Kafka Consumers in python?

Answers (1)

Related Questions