Ander
Ander

Reputation: 513

Spark Structured Streaming app has no jobs and no stages

I have a simple Spark Structured Streaming app that reads from Kafka and writes to HDFS. Today the app has mysteriously stopped working, with no changes or modifications whatsoever (it had been working flawlessly for weeks).

So far, I have observed the following:

Despite all of that, nothing is being written to HDFS anymore. Code snippet:

val inputData = spark
.readStream.format("kafka")
.option("kafka.bootstrap.servers", bootstrap_servers)
.option("subscribe", topic-name-here")
.option("startingOffsets", "latest")
.option("failOnDataLoss", "false").load()

inputData.toDF()
.repartition(10)
.writeStream.format("parquet")
.option("checkpointLocation", "hdfs://...")
.option("path", "hdfs://...")
.outputMode(OutputMode.Append())
.trigger(Trigger.ProcessingTime("60 seconds"))
.start()

Any ideas why the UI shows no jobs/tasks?

No jobs for the application

No tasks and basically no activity

Query Progress

Upvotes: 5

Views: 1200

Answers (1)

Ander
Ander

Reputation: 513

For anyone facing the same issue: I found the culprit:

Somehow the data within _spark_metadata in the HDFS directory where I was saving the data got corrupted.

The solution was to erase that directory and restart the application, which re-created the directory. After data, data started flowing.

Upvotes: 5

Related Questions