Reputation: 516
I have come across the config spark.streaming.kafka.consumer.cache.enabled= false in the properties of our application and surprisingly no one in my team knows how does this helps us in achieving better performance. It was added on advice of the support from Cloudera. I couldn't find any elaborate explanation about this property in the Spark Docs. Can anyone please help me understand how does this configuration affect the Kafka Consumer performance.
Upvotes: 2
Views: 1504
Reputation: 191738
Looking at the source code, you can see that it has a useCache : Boolean
value, and seems to be putting internal KafkaConsumer objects into this cache based on the group id & topic+partition assignments.
I don't have any idea why not caching consumers would be "more performant", but I could guess that not having them cached allows for the Kafka consumer group rebalancing to operate "better"
If you think this property is missing its necessary documentation, then I would suggest opening a JIRA
Upvotes: 1