Arvinth
Arvinth

Reputation: 70

Error while inserting into partitioned hive table for spark scala

I am having hive table with following structure

CREATE TABLE gcganamrswp_work.historical_trend_result( column_name string,
metric_name string,
current_percentage string,
lower_threshold double,
upper_threshold double,
calc_status string,
final_status string,
support_override string,
dataset_name string,
insert_timestamp string,
appid string,
currentdate string,
indicator map<string,string>)
PARTITIONED BY (
appname string,
year_month int)
STORED AS PARQUET TBLPROPERTIES ("parquet.compression"="SNAPPY");

I am having spark dataframe with schema

root
 |-- metric_name: string (nullable = true)
 |-- column_name: string (nullable = true)
 |-- Lower_Threshold: double (nullable = true)
 |-- Upper_Threshold: double (nullable = true)
 |-- Current_Percentage: double (nullable = true)
 |-- Calc_Status: string (nullable = false)
 |-- Final_Status: string (nullable = false)
 |-- support_override: string (nullable = false)
 |-- Dataset_Name: string (nullable = false)
 |-- insert_timestamp: string (nullable = false)
 |-- appId: string (nullable = false)
 |-- currentDate: string (nullable = false)
 |-- indicator: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)
 |-- appname: string (nullable = false)
 |-- year_month: string (nullable = false)

when i try to insert into hive table using below code it is failing

    spark.conf.set("hive.exec.dynamic.partition", "true")
    spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
    data_df.repartition(1)
      .write.mode("append")
      .format("hive")
      .insertInto(Outputhive_table)

Spark Version : Spark 2.4.0

Error:

ERROR Hive:1987 - Exception when loading partition with parameters partPath=hdfs://gcgprod/data/work/hive/historical_trend_result/.hive-staging_hive_2021-09-01_04-34-04_254_8783620706620422928-1/-ext-10000/_temporary/0, table=historical_trend_result, partSpec={appname=, year_month=}, replace=false, listBucketingEnabled=false, isAcid=false, hasFollowingStatsTask=false org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Partition spec is incorrect. {appname=, year_month=}) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1662) at org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:1970) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: MetaException(message:Partition spec is incorrect. {appname=, year_month=}) at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:329) at org.apache.hadoop.hive.metastore.Warehouse.makePartPath(Warehouse.java:312) at org.apache.hadoop.hive.ql.metadata.Hive.genPartPathFromTable(Hive.java:1751) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1607)

I have specified the partition columns in the last columns of dataframe, so i expect it consider last tow columns as partition columns. I wanted to used the same routine for inserting different tables so i don't want to mention the partition columns explicitly

Upvotes: 1

Views: 3206

Answers (2)

Vikas Saxena
Vikas Saxena

Reputation: 1183

Just to recap that you are using spark to write data to a hive table with dynamic partitions. So my answer below is based on same, if my understanding is incorrect, please feel free to correct me in comment.

While you have a table that is dynamically partitioned (by app_name and year_month), the spark job doesn't know the partitioning fields in the destination so you will still have to tell your spark job about the partitioning field of the destination table.

Something like this should work

data_df.repartition(1)
      .write
      .partitionBy("appname", "year_month")
      .mode(SaveMode.Append)
      .saveAsTable(Outputhive_table)

Make sure that you enable support for dynamic partitions by executing something like

hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

Check out this post by Itai Yaffe, this may be handy https://medium.com/nmc-techblog/spark-dynamic-partition-inserts-part-1-5b66a145974f

Upvotes: 2

Raphael Roth
Raphael Roth

Reputation: 27373

I think the problem is that some records have appname and year_month as strings. At least this is suggested by

Partition spec is incorrect. {appname=, year_month=}

Make sure partition colums are never empty or null! Also note that the type of year_month is not consistent between the DataFrame and your schema (string/int)

Upvotes: 1

Related Questions