Reputation: 145
I am trying to partition the hive table with distinct timestamps. I have a table with timestamps in it but when I execute the hive partition query, it says that it is not a valid partition column. Here's the table:
+---+-----------------------+
|id |rc_timestamp |
+---+-----------------------+
|1 |2017-06-12 17:18:39.824|
|2 |2018-06-12 17:18:39.824|
|3 |2019-06-12 17:18:39.824|
+---+-----------------------+
spark.sql("SET hive.exec.dynamic.partition.mode=nonrestrict")
val tempTable = spark.sql("SELECT * FROM partition_table")
val df = tempTable.select("rc_timestamp")
val a = x.toString().replaceAll("[\\[\\]]","")
df.collect().foreach(a => {
spark.sql(s"ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION
(rc_timestamp = '$a')").show()
)}
Here's the error which I'm getting:
org.apache.spark.sql.AnalysisException: rc_timestamp is not a valid partition column
in table mydb.partition_table.;
Upvotes: 3
Views: 9418
Reputation: 29227
First thing is check your syntaxes using this InsertSuite test case specially this
AFAIK you need msck repair or refresh table
spark.sql(s"refresh table tableNameWhereYouAddedPartitions")
what it does is it will refresh the existing partitions.
you can go with spark.sql('MSCK REPAIR TABLE table_name')
There is something called recoverPartitions
(Only works with a partitioned table, and not a view). This is aliased version of msck repair table
. you can go ahead and try this..
see this ddl.scala seems like its equalent by documentation.
example usage :
spark.catalog.recoverPartitions(tableName)
Note: The RECOVER PARTITIONS clause automatically recognizes any data files present in these new directories, the same as the REFRESH statement does.
Upvotes: 3
Reputation: 2224
You cannot change the partitioning scheme on a Hive table. This would have to rewrite the complete dataset since partitions are mapped to folders in HDFS/S3/FileSystem.
If you want to change partition scheme, the only options is to create a new table and give partitioning information in the create table
command. After that you have to insert data into new table from the old table. You can also use the CTAS command for the same.
ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION <(rc_timestamp = '$a')>
- command only adds new partitions in the metastore for an existing partitioned Hive table. For example, let say you have a table T1
which is partitioned on column year
. If you want to make metastore aware about "year=2018", then this command is used.
Upvotes: 1