writing dataframe to parquet files failes with empty or nested empty schemas

Question

I 'm pretty new to both scala and spark. I have a very dumb question. I have a dataframe that I created from elasticsearch. I'm trying to write that s3 in parquet format. below is my code block and error i'm seeing. Can a good samaritan please undumb me on this one?

      val dfSchema = dataFrame.schema.json
//      log.info(dfSchema)
      dataFrame
        .withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
        .write
        .partitionBy("lastFound")
        .mode("append")
        .format("parquet")
        .option("schema", dfSchema)
        .save("/tmp/elasticsearch/")

org.apache.spark.sql.AnalysisException: 
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
         ;
    at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
    at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)

firsni · Accepted Answer

You don't need to put schema when you write data in parquet format.

When you use the append mode, you suppose that you have already data stored in the path you precise and that you want to add new data. If you want to overwrite, you can put "overwrite" instead of "append" and if the path is new you don't need to put anything.

When you write to s3, the path normally should be like this "s3://bucket/the folder"

Can you try this :

 dataFrame
    .withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
    .write
    .partitionBy("lastFound")
    .mode("append")
    .parquet("/tmp/elasticsearch/")

writing dataframe to parquet files failes with empty or nested empty schemas

Answers (1)

Related Questions