Ayan Biswas
Ayan Biswas

Reputation: 1635

Error While Writing into a Hive table from Spark Sql

I am trying to insert data into a Hive External table from Spark Sql. I am created the hive external table through the following command

CREATE EXTERNAL TABLE  tab1 ( col1 type,col2 type ,col3 type) CLUSTERED BY (col1,col2) SORTED BY (col1) INTO 8 BUCKETS STORED AS PARQUET

In my spark job , I have written the following code Dataset df = session.read().option("header","true").csv(csvInput);

df.repartition(numBuckets, somecol)
                  .write()
                  .format("parquet")
                  .bucketBy(numBuckets,col1,col2)
                  .sortBy(col1)
                  .saveAsTable(hiveTableName);

Each time I am running this code I am getting the following exception

org.apache.spark.sql.AnalysisException: Table `tab1` already exists.;
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:408)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
    at somepackage.Parquet_Read_WriteNew.writeToParquetHiveMetastore(Parquet_Read_WriteNew.java:100)

Upvotes: 2

Views: 13090

Answers (2)

Pawan B
Pawan B

Reputation: 4623

You should be specifying a save mode while saving the data in hive.

df.write.mode(SaveMode.Append)
              .format("parquet")
              .bucketBy(numBuckets,col1,col2)
              .sortBy(col1)
              .insertInto(hiveTableName);

Spark provides the following save modes:

Save Mode

ErrorIfExists: Throws an exception if the target already exists. If target doesn’t exist write the data out.

Append: If target already exists, append the data to it. If the data doesn’t exist write the data out.

Overwrite: If the target already exists, delete the target. Write the data out.

Ignore: If the target already exists, silently skip writing out. Otherwise write out the data.

Upvotes: 6

Amit Kumar
Amit Kumar

Reputation: 1584

You are using the saveAsTable API, which create the table into Hive. Since you have already created the hive table through command, the table tab1 already exists. so when Spark API trying to create it, it throws error saying table already exists, org.apache.spark.sql.AnalysisException: Tabletab1already exists.

Either drop the table and let spark API saveAsTable create the table itself. Or use the API insertInto to insert into an existing hive table.

df.repartition(numBuckets, somecol)
                  .write()
                  .format("parquet")
                  .bucketBy(numBuckets,col1,col2)
                  .sortBy(col1)
                  .insertInto(hiveTableName);

Upvotes: 6

Related Questions