Reputation: 11
when I use pyspark to connect to hive, I want to get the comment of the table, I used "DESCRIBE TABLE table_name", but it doesn't work
sql = """(DESCRIBE TABLE table_name) as t"""
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "********") \
.option("dbtable", sql) \
.option("user", "username") \
.option("password", "password") \
.load()
error: line 1:33 missing GRPAH_PATH at '(' near "table_name" in statement
Upvotes: 1
Views: 1086
Reputation: 347
you can try something like
df = spark.sql("describe extended db.my_tbl")
df.where("col_name = 'Table Properties'").collect()
would then provide you the table comment(s) if that was set when creating the table.
Upvotes: 0
Reputation: 2767
Spark
uses Hive
as its underlying meta-store therefore you only need to use Spark SQL
to find the information you are looking for. You can also read in the table as a dataframe
for further analysis. You would use jdbc
to connect to external RDBMS
... for example something like SQL Server, Oracle, Redshift, PostgresSQL, etc.
# specify your hive database
spark.sql("use default")
# get hive table stats
spark.sql("DESCRIBE TABLE spark_sql_table").show()
+-------------------+---------+-------+
| col_name|data_type|comment|
+-------------------+---------+-------+
| DEST_COUNTRY_NAME| string| null|
|ORIGIN_COUNTRY_NAME| string| null|
| count| bigint| null|
+-------------------+---------+-------+
# read hive table as spark df
df = spark.table("spark_sql_table")
# confirm schema
df.printSchema()
root
|-- DEST_COUNTRY_NAME: string (nullable = true)
|-- ORIGIN_COUNTRY_NAME: string (nullable = true)
|-- count: long (nullable = true)
# perform analysis
df.show()
Upvotes: 1