Reputation: 409
I would like to drop Databricks SQL DB tables, if the table was created more than 30 days ago. How do I get the table created datetime from databricks?
Thanks
Upvotes: 1
Views: 1182
Reputation: 716
Assuming your DB table is delta:
You can use the DESCRIBE HISTORY <database>.<table>
to retrieve all transactions made to that table, including timestamps. According to the databricks documentation - history is only retained for 30 days. Depending on how you plan to implement your solution that may just work.
Upvotes: 0
Reputation: 13538
Given a tableName
, the easiest way to get the creation time is as follows:
import org.apache.spark.sql.catalyst.TableIdentifier
val createdAtMillis = spark.sessionState.catalog
.getTempViewOrPermanentTableMetadata(new TableIdentifier(tableName))
.createTime
getTempViewOrPermanentTableMetadata()
returns CatalogTable
that contains information such as:
CatalogTable(
Database: default
Table: dimension_npi
Owner: root
Created Time: Fri Jan 10 23:37:18 UTC 2020
Last Access: Thu Jan 01 00:00:00 UTC 1970
Created By: Spark 2.4.4
Type: MANAGED
Provider: parquet
Num Buckets: 8
Bucket Columns: [`npi`]
Sort Columns: [`npi`]
Table Properties: [transient_lastDdlTime=1578699438]
Location: dbfs:/user/hive/warehouse/dimension_npi
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Schema: root
|-- npi: integer (nullable = true)
...
)
You can list all tables in a database using sessionCatalog.listTables(database)
.
There are alternative ways of accomplishing the same but with a lot more effort and risking errors due to Spark behavior changes: poking about table metadata using SQL and/or traversing the locations where tables are stored and looking at file timestamps. That's why it's best to go via the catalog APIs.
Hope this helps.
Upvotes: 1