Reputation: 13920
There are important metadata registered in the tables by
create table t1 (
column1 type COMMENT '... the comments1...',
column2 type COMMENT '... the comments2...'
...
);
in our database hundreds of tables, and hundreds of column's COMMENT
for each table.
We need Spark or HQL statement to obtain SQL column comments.
How to do some serious information retrieval about it? I need it as a first class table,... Or, if it is impossible, a JSON or standard MIME CSV file with all metadata.
NOTES:
We use both, direct Hive SQL and Spark, so any answer (HQL or Scala/Spark script) is valid.
Not need to scrape SHOW/DESCRIBE TABLES command, need serious solution.
This question is similar, but restricted to MySQL... The only clue is to do some direct acess to the Metastore (in our config seems Derby SQL schema - no standard).
This other question also about "serious information retrieval" of metadata.
Upvotes: 3
Views: 4367
Reputation: 359
In case you use PySpark, you can do the following:
for col_schema in df.schema:
print(col_schema.metadata['comment'])
Upvotes: 2
Reputation: 6897
You can retrieve the comments through the DataFrame schema using StructField.getComment()
For example if you want to manipulate the comments as a dataframe
// create a demo table
spark.sql("create table t1 (id long comment 'this is an example comment')")
// load table as dataframe
val t1 = spark.table("t1")
// load column comments as dataframe
val commentsDf1 = t1.schema.map(f => (f.name,f.getComment)).toDF("name","comment")
// check that everything is loaded correctly
commentsDf1.show(false)
+----+--------------------------+
|name|comment |
+----+--------------------------+
|id |this is an example comment|
+----+--------------------------+
If you want to use SQL to access these comments, you can use DESCRIBE <tablename>
to achieve the same result:
val commentsDf2 = spark.sql("describe t1")
commentsDf2.show(false)
+--------+---------+--------------------------+
|col_name|data_type|comment |
+--------+---------+--------------------------+
|id |bigint |this is an example comment|
+--------+---------+--------------------------+
Both, commentsDf1
and commentsDf2
, are first-class citizens: commentsDf2
is tabular and commentsDf1
a little bit more richer and complex.
Upvotes: 4