koky
koky

Reputation: 11

BigQuery Select unused tables in the past 90 days, SQL

I would like to select the tables that are not used in the past 90 days in BigQuery.

I have a table called "bq_query_audit_tables" and it contains lists of querys, the create time of the query, and the table that the user query to.

Here's what I do:

SELECT
    max(createTime) as max_createTime, table_name, count(*)
FROM
    bq_audits.bq_query_audit_tables AS audit_tables
LEFT JOIN
    region-us.INFORMATION_SCHEMA.TABLES AS tables
ON tables.table_catalog= audit_tables.projectId AND tables.table_schema= audit_tables.datasetId AND tables.table_name= audit_tables.tableId
    AND CASE tables.table_type WHEN 'BASE TABLE' THEN audit_tables.table_type ='table' WHEN 'VIEW' THEN  audit_tables.table_type ='view' END
GROUP BY
    table_name, createTime
HAVING
    MAX(DATE(createTime)) <= CURRENT_DATE() - 90
ORDER BY 
    count(*) DESC

But the result contains the tables that have been used everyday.

Here's what I do:

SELECT
    max(createTime) as max_createTime, table_name, count(*)
FROM
    bq_audits.bq_query_audit_tables AS audit_tables
LEFT JOIN
    region-us.INFORMATION_SCHEMA.TABLES AS tables
ON tables.table_catalog= audit_tables.projectId AND tables.table_schema= audit_tables.datasetId AND tables.table_name= audit_tables.tableId
    AND CASE tables.table_type WHEN 'BASE TABLE' THEN audit_tables.table_type ='table' WHEN 'VIEW' THEN  audit_tables.table_type ='view' END
GROUP BY
    table_name, createTime
HAVING
    MAX(DATE(createTime)) <= CURRENT_DATE() - 90
ORDER BY 
    count(*) DESC

But the result contains the tables that have been used everyday.

Upvotes: 1

Views: 1308

Answers (1)

Samuel
Samuel

Reputation: 3528

This list all tables not in query the last days.

SELECT table_name,table_schema,max(last_query_time) as last_query_time
FROM `-->project.dataset<--.INFORMATION_SCHEMA.TABLES` AS tables
Left join
(Select creation_time  as last_query_time ,x
FROM `region-eu`.INFORMATION_SCHEMA.JOBS_BY_USER  ,unnest(referenced_tables) as x)
on table_catalog=x.project_id and x.dataset_id=table_schema and x.table_id=table_name
group by 1,2
having last_query_time is null or date(last_query_time)<CURRENT_DATE() - 90

Upvotes: 1

Related Questions