WJA
WJA

Reputation: 7004

How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?

The biggest chunk of my BigQuery billing comes from query consumption. I am trying to optimize this by understanding which datasets/tables consume the most.

I am therefore looking for a way to track my BigQuery usage, but ideally something that is more in realtime (that I don't have to wait a day before I get the final results). The best way would be for instance how much each table/dataset consumed in the last hour.

So far I managed to find the Dashboard Monitoring but this only allows to display the queries in flight per project and the stored bytes per table, which is not what I am after.

What other solutions are there to retrieve this kind of information?

Upvotes: 1

Views: 2231

Answers (2)

Hirnhamster
Hirnhamster

Reputation: 7439

It might be easier to use the INFORMATION_SCHEMA.JOBS_BY_* views because you don't have to set up the stackdriver logging and can use them right away.

Example taken & modified from How to monitor query costs in Google BigQuery

DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;

SELECT
 ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
 ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
 user_email,
FROM (
  (SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
  UNION ALL
  (SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
  DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY 
  user_email

Some caveats:

  • you need to UNION ALL all of the projects that you use explicitly
  • JOBS_BY_USER did not work for me on my private account (supposedly because me login email is @googlemail and big query stores my email as @gmail`)
  • the WHERE condition needs to be adjusted for your billing period (instead of the last 30 days)
  • doesn't provide the "bytes billed" information, so we need to determine those based on the cache usage
  • doesn't include the "if less than 10MB use 10MB" condition
  • data is only retained for the past 180 days
  • DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5; reflects only US costs - other regions might have different costs - see https://cloud.google.com/bigquery/pricing#on_demand_pricing
  • you can only query one region at a time

Upvotes: 1

norbjd
norbjd

Reputation: 11277

Using Stackdriver logs, you could create a sink with Pub/Sub topic as target for real-time analysis that filter only BigQuery logs like this :

resource.type="bigquery_resource" AND
proto_payload.method_name="jobservice.jobcompleted" AND
proto_payload.service_data.job_completed_event.job.job_statistics.total_billed_bytes:*

(see example queries here : https://cloud.google.com/logging/docs/view/query-library?hl=en_US#bigquery-filters)

You could create the sink on a specific project, a folder or even an organization. This will retrieve all the queries done in BigQuery in that specific project, folder or organization.

The field proto_payload.service_data.job_completed_event.job.job_statistics.total_billed_bytes will give you the number of bytes processed by the query.

Based on on-demand BigQuery pricing (as of now, $5/TB for most regions, but check for your own region), you could easily estimate in real-time the billing. You could create a Dataflow job that aggregates the results in BigQuery, or simply consume the destination Pub/Sub topic with any job you want to make the pricing calculation :

jobPriceInUSD = totalBilledBytes / 1_000_000_000_000 * pricePerTB

because 1 TB = 1_000_000_000_000 B. As I said before, pricePerTB depends on regions (see : (https://cloud.google.com/bigquery/pricing#on_demand_pricing for the exact price). For example, as of time of writing :

  • $5/TB for us-east1
  • $6/TB for asia-northeast1
  • $9/TB for southamerica-east1

Also, for each month, as of now, the 1st TB is free.

Upvotes: 1

Related Questions