Incognito
Incognito

Reputation: 145

Spark Partitioning Hive Table

I am trying to partition the hive table with distinct timestamps. I have a table with timestamps in it but when I execute the hive partition query, it says that it is not a valid partition column. Here's the table:

+---+-----------------------+
|id |rc_timestamp           |
+---+-----------------------+
|1  |2017-06-12 17:18:39.824|
|2  |2018-06-12 17:18:39.824|
|3  |2019-06-12 17:18:39.824|
+---+-----------------------+
spark.sql("SET hive.exec.dynamic.partition.mode=nonrestrict")

val tempTable  = spark.sql("SELECT * FROM partition_table")

val df = tempTable.select("rc_timestamp")

val a = x.toString().replaceAll("[\\[\\]]","")

df.collect().foreach(a => {
  spark.sql(s"ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION 
  (rc_timestamp = '$a')").show()
)}

Here's the error which I'm getting:

org.apache.spark.sql.AnalysisException: rc_timestamp is not a valid partition column 
in table mydb.partition_table.;

Upvotes: 3

Views: 9418

Answers (2)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29227

First thing is check your syntaxes using this InsertSuite test case specially this

AFAIK you need msck repair or refresh table

spark.sql(s"refresh table tableNameWhereYouAddedPartitions")

what it does is it will refresh the existing partitions.

you can go with spark.sql('MSCK REPAIR TABLE table_name')

There is something called recoverPartitions (Only works with a partitioned table, and not a view). This is aliased version of msck repair table. you can go ahead and try this..

see this ddl.scala seems like its equalent by documentation.
example usage :

spark.catalog.recoverPartitions(tableName) 

Note: The RECOVER PARTITIONS clause automatically recognizes any data files present in these new directories, the same as the REFRESH statement does.

Upvotes: 3

moriarty007
moriarty007

Reputation: 2224

You cannot change the partitioning scheme on a Hive table. This would have to rewrite the complete dataset since partitions are mapped to folders in HDFS/S3/FileSystem.

If you want to change partition scheme, the only options is to create a new table and give partitioning information in the create table command. After that you have to insert data into new table from the old table. You can also use the CTAS command for the same.

ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION <(rc_timestamp = '$a')> - command only adds new partitions in the metastore for an existing partitioned Hive table. For example, let say you have a table T1 which is partitioned on column year. If you want to make metastore aware about "year=2018", then this command is used.

Upvotes: 1

Related Questions