Thelight
Thelight

Reputation: 369

bigquery longterm storage table list

As per the GCP documentation :

Long-term storage:

If a table is not edited for 90 consecutive days, the price of storage for that table automatically drops by approximately 50 percent. There is no degradation of performance, durability, availability, or any other functionality when a table is considered long term storage.

Is there any way to list out all the tables which falls under this criteria in a Bigquery dataset?

Upvotes: 0

Views: 724

Answers (1)

Graham Polley
Graham Polley

Reputation: 14791

One way is to use some Bash magic. I've knocked this up quickly using Cloud Shell, which has jq pre-installed. It works by:

  1. Listing all datasets
  2. Listing each table in each dataset
  3. For each table, call bq show to grab the metadata information about the table
  4. Using jq pull out the id and numLongTermBytes from each table
  5. The numerical value for numLongTermBytes is how much storage is in long term. If it's 0, then it's active storage.

Not sure how this would work with partitions & clustered tables (you'll need to test), but should be enough to get you started at least. You could of course write a neater Python script or something, but I like the 'simplicity' of Bash :)

Noted that this is somewhat of a fragile/brittle solution.

bq ls --format=json | jq -r '.[].id' | xargs -n 1 -P 4 -i bq ls --format=json {} | jq -r '.[].id' | xargs -n 1 -P 4 -i bq show --format=json {} | jq -r '.id + "," + .numLongTermBytes'

enter image description here

Upvotes: 7

Related Questions