Naresh
Naresh

Reputation: 5397

Direct Stream based SparkStreaming with Kafka showing only one Consumer-ID

I am using kafka-2.3.0 with spark 2.2.1 along with scala 2.11. I am using the Direct Stream approach in which driver query the latest offsets and decided the offset ranges for the batch of streaming and later with those offset ranges executors read the data from Kafka. As you can see below I have a topic named test-kafka which has 4 partitions and they are distributed amongst the both two leaders.

enter image description here Now, I start spark-streaming which read the data from same topic using following configuration :

                val kafkaParams = Map[String, Object](
                "bootstrap.servers" -> "localnosql1:9092,localnosql2:9092",
                "key.deserializer" -> classOf[StringDeserializer],
                "value.deserializer" -> classOf[StringDeserializer],
                "group.id" -> "streaming-group",
                "auto.offset.reset" -> "earliest",  
                "auto.commit.interval.ms" -> "1000",
                "enable.auto.commit" -> (true: java.lang.Boolean)
            )
            val topics = Array("test-kafka")
            val stream = KafkaUtils.createDirectStream[String, String](
                        ssc,
                        PreferConsistent,
                        Subscribe[String, String](topics, kafkaParams)
            )

So, now when i check info about consumer groups on CLI. It shows only one consumer-id has been assigned. Does it mean only one consumer is consuming the data from Kafka? If yes, why is it so? I have two exectuors running on the same machine where kafka is running as mentioned below.

enter image description here enter image description here

Upvotes: 3

Views: 427

Answers (1)

Hrishikesh Mishra
Hrishikesh Mishra

Reputation: 3433

It could be the case where you were committing offset from Driver (which valid). The Driver fetches the offsets from Kafka and handovers to Executors where KafakRDD pulls actual data from Kafka. After processing the batch, Driver commits the offset to Kafka.

I had the same question here is the link: Spark Streaming Direct Kafka Consumers are not evenly distrubuted across executors

Upvotes: 2

Related Questions