Reputation: 131
I'm trying to save dataframe in table hive.
In spark 1.6 it's work but after migration to 2.2.0 it doesn't work anymore.
Here's the code:
blocs
.toDF()
.repartition($"col1", $"col2", $"col3", $"col4")
.write
.format("parquet")
.mode(saveMode)
.partitionBy("col1", "col2", "col3", "col4")
.saveAsTable("db".tbl)
The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat
. It doesn't match the specified format ParquetFileFormat
.;
org.apache.spark.sql.AnalysisException: The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat
. It doesn't match the specified format ParquetFileFormat
.;
Upvotes: 13
Views: 22211
Reputation: 94
Setting the format("hive")
did not worked for me. How I solved this problem is as below.
Insert the current dataframe into a new temp table, Make sure you have same schema, column types and order of colmns in temp table w.r.t your actual target table.
df.write.format("parquet").partitionBy('date').mode("append") \
.saveAsTable('HiveDB.temp_table', path="s3://some_path/temp_table" )
Now physically copy the files from path "s3://some_path/temp_table/" to your actual target table path. (in my case "s3://some_path/actual_table/" )
Now run the below command through spark.sql or from Hue\Athena
MSCK REPAIR TABLE `actual_table`;
I was facing this issue while writing pyspark dataframe into Glue Catalog table that was created before via AWS Wrangler API.
Upvotes: 0
Reputation: 23
I set TBLPROPERTITES to table and saveAsTable work
ALTER TABLE "db".tbl
SET TBLPROPERTIES ('spark.sql.partitionProvider='catalog',
'spark.sql.sources.provider' = 'parquet')
if need, set SERDEPROPERTIES
ALTER TABLE "db".tbl
SET SERDEPROPERTIES('path'='hdfs:..')
Upvotes: 0
Reputation: 341
I have just tried to use .format("hive")
to saveAsTable
after getting the error and it worked.
I also would not recommend to use insertInto
suggested by the author, because it looks not type-safe (as much as this term can be applied to SQL API) and is error-prone in the way it ignores column names and uses position-base resolution.
Upvotes: 14