Reputation: 287
I am trying to identify the partition Column names in a hive table using Spark .I am able to do that using show partitions followed by parsing the resultset to extract the partition columns .However , the drawback is , if some of the tales do not have a partition in them , the show partition fails .Is there a more organic way to identify the partition column names in a hive table. Any help would be highly appreciated
v_query="show partitions {}".format(table_name)
a=self.spark.sql(v_query)
val=a.rdd.map(list).first()
val1=''.join(val)
partition_list=[l.split('=')[0] for l in val1.split('/')]
Upvotes: 2
Views: 8149
Reputation: 2546
How to get an array of partition columns in Scala/Spark:
spark.catalog.listColumns(fullTableName)
.where(col("ispartition"))
.select("name")
.collect()
.map(_.getAs[String]("name"))
Feel free to change the above code to work with PySpark.
Upvotes: 1
Reputation: 4701
Above code will fail incase table is not partitioned. It will give you an error message like "pyspark.sql.utils.AnalysisException: u'SHOW PARTITIONS is not allowed on a table that is not partitioned"
you can use map operation on desc
command to get the partition column information.
>>> spark.sql("""desc db.newpartitiontable""").show(truncate=False)
+-----------------------+---------+-------+
|col_name |data_type|comment|
+-----------------------+---------+-------+
|city |string |null |
|state |string |null |
|country |string |null |
|tran_date |string |null |
|# Partition Information| | |
|# col_name |data_type|comment|
|tran_date |string |null |
+-----------------------+---------+-------+
partition_exists=spark.sql("""desc db.newpartitiontable""").rdd.map(lambda x:x[0]).filter(lambda x: x.startswith("# col_name")).collect()
if len(partition_exists)>0:
print("table is partitioned")
partition_col=spark.sql("""show partitions test_dev_db.newpartitiontable""").rdd.map(lambda x:x[0]).map(lambda x : [l.split('=')[0] for l in x.split('/')]).first()
print("Partition column is:",partition_col)
else:
print("table is not partitioned")
Upvotes: 3