eric_stat
eric_stat

Reputation: 11

How to access the comment in the hive table when using pyspark?

when I use pyspark to connect to hive, I want to get the comment of the table, I used "DESCRIBE TABLE table_name", but it doesn't work

sql = """(DESCRIBE TABLE table_name) as t"""
jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", "********") \
    .option("dbtable", sql) \
    .option("user", "username") \
    .option("password", "password") \
    .load()

error: line 1:33 missing GRPAH_PATH at '(' near "table_name" in statement

Upvotes: 1

Views: 1086

Answers (2)

Ashish Kumar
Ashish Kumar

Reputation: 347

you can try something like

df = spark.sql("describe extended db.my_tbl") df.where("col_name = 'Table Properties'").collect() would then provide you the table comment(s) if that was set when creating the table.

Upvotes: 0

thePurplePython
thePurplePython

Reputation: 2767

Spark uses Hive as its underlying meta-store therefore you only need to use Spark SQL to find the information you are looking for. You can also read in the table as a dataframe for further analysis. You would use jdbc to connect to external RDBMS ... for example something like SQL Server, Oracle, Redshift, PostgresSQL, etc.

# specify your hive database
spark.sql("use default")

# get hive table stats
spark.sql("DESCRIBE TABLE spark_sql_table").show()
+-------------------+---------+-------+
|           col_name|data_type|comment|
+-------------------+---------+-------+
|  DEST_COUNTRY_NAME|   string|   null|
|ORIGIN_COUNTRY_NAME|   string|   null|
|              count|   bigint|   null|
+-------------------+---------+-------+

# read hive table as spark df
df = spark.table("spark_sql_table")

# confirm schema
df.printSchema()
root
 |-- DEST_COUNTRY_NAME: string (nullable = true)
 |-- ORIGIN_COUNTRY_NAME: string (nullable = true)
 |-- count: long (nullable = true)

# perform analysis
df.show()

Upvotes: 1

Related Questions