Vineeth Vishwanath
Vineeth Vishwanath

Reputation: 161

How do I scale Kafka Consumers in python?

This probably has multiple questions so bear with me. I am still figuring out the right way to use the Kafka Architecture. I know that the partitions of a topic are divided b/w the consumers.

What exactly are consumers? Right now, I am thinking of writing a daemon python process that acts as a consumer. When the consumer consumes a message from Kafka, there is a task that I have to complete. This is a huge task so I am creating sub-tasks that run concurrently. Can I have multiple consumers(python scripts) on the same machine?

I have multiple microservices that I am working on, so each microservice has its own consumer?

When the load increases I have to scale the consumers. I thought of spawning a new machine that has acts as another consumer. But I just feel that I am doing something wrong here and feel that there has to be a better way.

Can you tell me how you scaled your consumers based on the load? Do I have to increase my partitions in topics if I need to increase my consumers? How do I do it dynamically? Can I decrease the partitions when there are fewer messages produced? How many partitions are ideal initially?

And please suggest some good practices to follow.

This is the consumer script that I am using

while True:
    message = client.poll(timeout=10)#client is the KafkaConsumer object
    if message is not None:
        if message.error():
            raise KafkaException(message.error())
        else:
            logger.info('recieved topic {topic} partition {partition} offset {offset} key {key} - {value}'.format(
                topic=message.topic(),
                partition=message.partition(),
                offset=message.offset(),
                key=message.key(),
                value=message.value()
            ))
            #run task

Upvotes: 2

Views: 3977

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191874

Can I have multiple consumers(python scripts) on the same machine?

Yes. You can also have Python threads, though.

If you're not consuming multiple topics, then there is no need for multiple consumers.

What exactly are consumers?

Feel free to read over the Apache Kafka site...

each microservice has its own consumer?

Is each service running similar code? Then yes.

I thought of spawning a new machine

Spawn new instances of your app on one machine. Monitor CPU and Mem and Network load. Don't get new machines until at least one of those is above say 70% under normal processing.

Do I have to increase my partitions in topics if I need to increase my consumers?

In general, yes. The number of consumers in a consumer group is limited by the number of partitions in the subscribed topics.

Can I decrease the partitions when there are fewer messages produced?

No. Partitions cannot be decreased

When the load increases I have to scale the consumers

Not necessarily. Is the increased load constantly rising, or are there waves of it? If variable, then you can let Kafka buffer the messages. And the consumer will keep polling and processing as fast as it can.

You need to define your SLAs for how long a message will take to process after reaching a topic from a producer.

How many partitions are ideal initially?

There are multiple articles on this, and it depends specifically on your own hardware and application requirements. Simply logging each message, you could have thousands of partitions...

When the consumer consumes a message from Kafka, there is a task that I have to complete

Sounds like you might want to look at Celery, not necessarily just Kafka. You could also look at Faust for Kafka processing

Upvotes: 2

Related Questions