Kafka on Spark only reads realtime ingestion

Question

Spark version = 2.3.0

Kafka version = 1.0.0

Sinppet of code being used:

# Kafka Enpoints
zkQuorum = '192.168.2.10:2181,192.168.2.12:2181' 
topic = 'Test_topic'

# Create a kafka Stream
kafkaStream = KafkaUtils.createStream(ssc, zkQuorum, "cyd-demo-azureactivity-streaming-consumer", {topic: 1})

When the Kafka stream is run real time, I see spark pulling data, however if I start Kafka say an hour before Spark, it will not pick up the hour old data.

Is this expected or is there a way to set something up in a configuration?

Code run using:

sudo $SPARK_HOME/spark-submit --master local[2] --jars /home/steven/jars/elasticsearch-hadoop-6.3.2.jar,/home/steven/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/steven/code/demo/test.py

Kafka on Spark only reads realtime ingestion

Answers (1)

Related Questions