user3679686
user3679686

Reputation: 516

spark.streaming.kafka.consumer.cache.enabled property working/ affect on performance of Kafka Consumers

I have come across the config spark.streaming.kafka.consumer.cache.enabled= false in the properties of our application and surprisingly no one in my team knows how does this helps us in achieving better performance. It was added on advice of the support from Cloudera. I couldn't find any elaborate explanation about this property in the Spark Docs. Can anyone please help me understand how does this configuration affect the Kafka Consumer performance.

Upvotes: 2

Views: 1504

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191738

Looking at the source code, you can see that it has a useCache : Boolean value, and seems to be putting internal KafkaConsumer objects into this cache based on the group id & topic+partition assignments.

I don't have any idea why not caching consumers would be "more performant", but I could guess that not having them cached allows for the Kafka consumer group rebalancing to operate "better"

If you think this property is missing its necessary documentation, then I would suggest opening a JIRA

Upvotes: 1

Related Questions