sortof
sortof

Reputation: 541

How to convert array to string value

Hello i am trying to get my queries log cost, i get the total amount but when i try to break it down per dataset i get this error:

' Cannot access field datasetId on a value with type ARRAY> at '

this is my query i am trying to run:

WITH
  data AS (
  SELECT
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
    (
    SELECT
      ARRAY_TO_STRING((
        SELECT
          ARRAY_AGG(datasetId)
        FROM
          UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId) ))) AS datasetIds
  FROM
    `kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206` )
SELECT
  datasetIds,
  FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
  data
WHERE
  jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY
  datasetIds
ORDER BY
  Estimated_USD_Cost DESC

I am using Standard SQL Dialect
how do i cast this field:

protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables.datasetId

from array to a string ? what am i missing ? Thanks.

Upvotes: 0

Views: 6513

Answers (2)

Elliott Brossard
Elliott Brossard

Reputation: 33745

You need to UNNEST the outer array in order to select the dataset ID inside:

SELECT
  ARRAY_TO_STRING((
    SELECT ARRAY_AGG(datasetId)
    FROM UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables)
    ), ',') AS datasetIds
FROM ...

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173038

Below is for BigQuery Standard SQL

#standardSQL   
WITH data AS (
  SELECT 
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent,
    ref.datasetId AS datasetId
  FROM `kkk111.bq_audit_log_export.cloudaudit_googleapis_com_data_access_20190206`,
  UNNEST(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.referencedTables) ref 
)
SELECT
  datasetId,
  FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM data
WHERE jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY datasetId
ORDER BY Estimated_USD_Cost DESC   

As you can see there, obviously you need to UNNEST referencedTables ARRAY but also you need to make sure your final calculation of Cost is as close to correct one as possible. The same query can reference multiple tables from the same dataset, so you better have DISTINCT in your CTE. But also, same query can reference tables from multiple datasets - so in this same billing bytes will be attributed to multiple datasets, so you will have overestimation! I don't know your exact intent - but you might want to introduce some logic to distribute the cost among the referenced datasets.

Upvotes: 1

Related Questions