SKumar
SKumar

Reputation: 63

List GCP BigQuery tables older than 90days

I am using the following Standard SQL query in BigQuery to list tables older than 90 days using table metadata.

DECLARE projects ARRAY<STRING>;
DECLARE dt_list ARRAY<STRING>;
DECLARE i INT64 DEFAULT 0;
DECLARE query_string STRING;
SET projects = ['my-project-1', 'my-project-2,...,'my-project-n'];

# List dataset of current project
SET schema_list = (
  SELECT
    ARRAY_AGG(schema_name)
  FROM
    INFORMATION_SCHEMA.SCHEMATA                    #1

#List Datasets of a Project
WHILE 
    i < ARRAY_LENGTH(projects) DO
        
        SET dt_list = ( "SELECT ARRAY_AGG(schema_name) FROM  UNNSET(projects) as proj,"|| proj[OFFSET(iter)] ||".INFORMATION_SCHEMA.SCHEMATA"                         #2
        );
        /*SET dt_list = ( " SELECT ARRAY_AGG(schema_name) FROM "  
                 || projects[OFFSET(iter)] ||".INFORMATION_SCHEMA.SCHEMATA"          #3
        
        );*/
        
        

        SET i = i+1;
END WHILE;

#List tables of a Dataset
WHILE
  i < ARRAY_LENGTH(dt_list) DO
    SET query_string = " SELECT dataset_id, table_id, ROUND(size_bytes/POW(10,9),2) AS size_gb, TIMESTAMP_MILLIS(creation_time) AS creation_time, TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time, row_count, type FROM "
        || dt_list[OFFSET(i)] || ".__TABLES__";
    EXECUTE IMMEDIATE query_string;
    SET i = i + 1;
END WHILE;

I am able to get list of tables of current GCP project with last modified date using '#1' query. When I am trying to get the same result using a Array of project(projects), I am getting errors like "Query error: Unrecognized name: proj"(for #2) and "Query error: Cannot coerce expression " SELECT ARRAY_AGG(schema_name) FROM " || projects[OFFSET(iter)] ||".INFORMATION_SCHEMA.SCHEMATA" to type ARRAY" (for #3).

My purpose is to list BigQuery tables older than 90 days(long term storage) using a array of projects( as currently we have multiple projects and planning to run this query in single project instead of running in each project individually) using standard SQL.

Please help.

Upvotes: 3

Views: 743

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75725

In BigQuery, the data pass to long term storage after 90 days without update, and it cost twice less. This rule is enforced at the partition level.

That's why, I recommend you to have a look to the partition information schema

SELECT  * FROM `projectID.dataset.INFORMATION_SCHEMA.PARTITIONS`

You have a column that tell you in your partition is in long term storage or not, and therefore, immediately, you can know if you can delete, or not the partition (and not the whole table). You can thus improve your storage optimisation like that.

You still have the last modified date, if you don't want to stick to the long term storage rule and have a purge different to 90 days without updates.

Upvotes: 0

Related Questions