Reputation: 70
I am having hive table with following structure
CREATE TABLE gcganamrswp_work.historical_trend_result(
column_name string,
metric_name string,
current_percentage string,
lower_threshold double,
upper_threshold double,
calc_status string,
final_status string,
support_override string,
dataset_name string,
insert_timestamp string,
appid string,
currentdate string,
indicator map<string,string>)
PARTITIONED BY (
appname string,
year_month int)
STORED AS PARQUET
TBLPROPERTIES ("parquet.compression"="SNAPPY");
I am having spark dataframe with schema
root
|-- metric_name: string (nullable = true)
|-- column_name: string (nullable = true)
|-- Lower_Threshold: double (nullable = true)
|-- Upper_Threshold: double (nullable = true)
|-- Current_Percentage: double (nullable = true)
|-- Calc_Status: string (nullable = false)
|-- Final_Status: string (nullable = false)
|-- support_override: string (nullable = false)
|-- Dataset_Name: string (nullable = false)
|-- insert_timestamp: string (nullable = false)
|-- appId: string (nullable = false)
|-- currentDate: string (nullable = false)
|-- indicator: map (nullable = false)
| |-- key: string
| |-- value: string (valueContainsNull = false)
|-- appname: string (nullable = false)
|-- year_month: string (nullable = false)
when i try to insert into hive table using below code it is failing
spark.conf.set("hive.exec.dynamic.partition", "true")
spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
data_df.repartition(1)
.write.mode("append")
.format("hive")
.insertInto(Outputhive_table)
Spark Version : Spark 2.4.0
Error:
ERROR Hive:1987 - Exception when loading partition with parameters partPath=hdfs://gcgprod/data/work/hive/historical_trend_result/.hive-staging_hive_2021-09-01_04-34-04_254_8783620706620422928-1/-ext-10000/_temporary/0, table=historical_trend_result, partSpec={appname=, year_month=}, replace=false, listBucketingEnabled=false, isAcid=false, hasFollowingStatsTask=false org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Partition spec is incorrect. {appname=, year_month=}) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1662) at org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:1970) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: MetaException(message:Partition spec is incorrect. {appname=, year_month=}) at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:329) at org.apache.hadoop.hive.metastore.Warehouse.makePartPath(Warehouse.java:312) at org.apache.hadoop.hive.ql.metadata.Hive.genPartPathFromTable(Hive.java:1751) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1607)
I have specified the partition columns in the last columns of dataframe, so i expect it consider last tow columns as partition columns. I wanted to used the same routine for inserting different tables so i don't want to mention the partition columns explicitly
Upvotes: 1
Views: 3206
Reputation: 1183
Just to recap that you are using spark to write data to a hive table with dynamic partitions. So my answer below is based on same, if my understanding is incorrect, please feel free to correct me in comment.
While you have a table that is dynamically partitioned (by app_name and year_month), the spark job doesn't know the partitioning fields in the destination so you will still have to tell your spark job about the partitioning field of the destination table.
Something like this should work
data_df.repartition(1)
.write
.partitionBy("appname", "year_month")
.mode(SaveMode.Append)
.saveAsTable(Outputhive_table)
Make sure that you enable support for dynamic partitions by executing something like
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
Check out this post by Itai Yaffe, this may be handy https://medium.com/nmc-techblog/spark-dynamic-partition-inserts-part-1-5b66a145974f
Upvotes: 2
Reputation: 27373
I think the problem is that some records have appname
and year_month
as strings. At least this is suggested by
Partition spec is incorrect. {appname=, year_month=}
Make sure partition colums are never empty or null! Also note that the type of year_month is not consistent between the DataFrame and your schema (string/int)
Upvotes: 1