Reputation: 486
I 'm pretty new to both scala and spark. I have a very dumb question. I have a dataframe that I created from elasticsearch. I'm trying to write that s3 in parquet format. below is my code block and error i'm seeing. Can a good samaritan please undumb me on this one?
val dfSchema = dataFrame.schema.json
// log.info(dfSchema)
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.format("parquet")
.option("schema", dfSchema)
.save("/tmp/elasticsearch/")
org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
Upvotes: 4
Views: 15294
Reputation: 916
You don't need to put schema when you write data in parquet format.
When you use the append mode, you suppose that you have already data stored in the path you precise and that you want to add new data. If you want to overwrite, you can put "overwrite" instead of "append" and if the path is new you don't need to put anything.
When you write to s3, the path normally should be like this "s3://bucket/the folder"
Can you try this :
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.parquet("/tmp/elasticsearch/")
Upvotes: 2