Michael
Michael

Reputation: 111

Exception java.util.NoSuchElementException: None.get in Spark Dataset save() operation

I got the exception "java.util.NoSuchElementException: None.get" when I tried to save Dataset to s3 storage as parquet:

The exception:

java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:787)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:322)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215)
...

Caused by: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker$.metrics(BasicWriteStatsTracker.scala:173)
at org.apache.spark.sql.execution.command.DataWritingCommand$class.metrics(DataWritingCommand.scala:51)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics$lzycompute(InsertIntoHadoopFsRelationCommand.scala:47)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics(InsertIntoHadoopFsRelationCommand.scala:47)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics$lzycompute(commands.scala:100)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics(commands.scala:100)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:56)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)

Looks like it's the issue related to the SparkContext. I didn't create instance of SparkContext explicit, instead, I use SparkSession only in my source code.

final SparkSession sparkSession = SparkSession
            .builder()
            .appName("Java Spark SQL job")
            .getOrCreate();

ds.write().mode("overwrite").parquet(path);

Any suggestions or work around? thanks

Update 1:

The creation of ds is a little complicated but I will try to list the main call stacks as below:

Process 1:

Process 2:

After step6, I loop back to step 1 with the same spark session for next process. finally sparkSession.stop() is called after all processed.

I can find the log after process 1 completed, which looks like indicating the SparkContext has been destroyed before the process 2:

INFO SparkContext: Successfully stopped SparkContext




  

Upvotes: 2

Views: 10301

Answers (1)

Michael
Michael

Reputation: 111

Just simply remove sparkSession.stop() solved this issue.

Upvotes: -1

Related Questions