Reputation: 541
Hello i am trying to get my queries log cost, i get the total amount but when i try to break it down per dataset i get this error:
' Cannot access field datasetId on a value with type ARRAY> at '
this is my query i am trying to run:
WITH
data AS (
SELECT
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
(
SELECT
ARRAY_TO_STRING((
SELECT
ARRAY_AGG(datasetId)
FROM
UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId) ))) AS datasetIds
FROM
`kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206` )
SELECT
datasetIds,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
data
WHERE
jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY
datasetIds
ORDER BY
Estimated_USD_Cost DESC
I am using Standard SQL Dialect
how do i cast this field:
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId
from array to a string ? what am i missing ? Thanks.
Upvotes: 0
Views: 6513
Reputation: 33745
You need to UNNEST the outer array in order to select the dataset ID inside:
SELECT
ARRAY_TO_STRING((
SELECT ARRAY_AGG(datasetId)
FROM UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables)
), ',') AS datasetIds
FROM ...
Upvotes: 1
Reputation: 173038
Below is for BigQuery Standard SQL
#standardSQL
WITH data AS (
SELECT
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
ref.datasetId AS datasetId
FROM `kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206`,
UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables) ref
)
SELECT
datasetId,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM data
WHERE jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY datasetId
ORDER BY Estimated_USD_Cost DESC
As you can see there, obviously you need to UNNEST referencedTables ARRAY but also you need to make sure your final calculation of Cost is as close to correct one as possible. The same query can reference multiple tables from the same dataset, so you better have DISTINCT in your CTE. But also, same query can reference tables from multiple datasets - so in this same billing bytes will be attributed to multiple datasets, so you will have overestimation! I don't know your exact intent - but you might want to introduce some logic to distribute the cost among the referenced datasets.
Upvotes: 1