Adeel Hashmi
Adeel Hashmi

Reputation: 837

How to save data in parquet format and append entries

I am trying to follow this example to save some data in parquet format and read it. If I use the write.parquet("filename"), then the iterating Spark job gives error that

"filename" already exists.

If I use SaveMode.Append option, then the Spark job gives the error

".spark.sql.AnalysisException: Specifying database name or other qualifiers are not allowed for temporary tables".

Please let me know the best way to ensure new data is just appended to the parquet file. Can I define primary keys on these parquet tables?

I am using Spark 1.6.2 on Hortonworks 2.5 system. Here is the code:

// Option 1: peopleDF.write.parquet("people.parquet")
//Option 2:
 peopleDF.write.format("parquet").mode(SaveMode.Append).saveAsTable("people.parquet")

// Read in the parquet file created above
val parquetFile = spark.read.parquet("people.parquet")

//Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile")
val teenagers = sqlContext.sql("SELECT * FROM people.parquet")

Upvotes: 4

Views: 7896

Answers (1)

dovka
dovka

Reputation: 1061

I believe if you use .parquet("...."), you should use .mode('append'), not SaveMode.Append:

df.write.mode('append').parquet("....")

Upvotes: 4

Related Questions