BdEngineer
BdEngineer

Reputation: 3199

Even after setting "auto.offset.reset" to "latest" getting error OffsetOutOfRangeException

me using spark-sql-2.4.1 version with Kafka 0.10 v.

While I try to consume data by consumer. it gives error below even after setting "auto.offset.reset" to "latest"

org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {COMPANY_INBOUND-16=168}
    at org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348)
    at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396)
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234)
    at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234)

where is the issue ? why setting is not working ? How should it be fixed?

Part 2 :

 .readStream()
                      .format("kafka")
                      .option("startingOffsets", "latest")
                      .option("enable.auto.commit", false)
                      .option("maxOffsetsPerTrigger", 1000)
                      .option("auto.offset.reset", "latest")
                      .option("failOnDataLoss", false)
                      .load();

Upvotes: 2

Views: 1507

Answers (1)

Vincent
Vincent

Reputation: 591

auto.offset.reset is ignored by Spark Structured Streaming, use startingOffsets option instead

auto.offset.reset: Set the source option startingOffsets to specify where to start instead. Structured Streaming manages which offsets are consumed internally, rather than rely on the kafka Consumer to do it. This will ensure that no data is missed when new topics/partitions are dynamically subscribed. Note that startingOffsets only applies when a new streaming query is started, and that resuming will always pick up from where the query left off.

Source

Upvotes: 4

Related Questions