Using BigQuery to format an output based on a JSON representation

Question

Good day, I'm working with BigQuery to construct an output in the format of

{"selectionListId":"SelectionListName","keys":["EVENT_TYPE_CODE","EVENT_TYPE_DESCRIPTION","METADATA"],"fileType":"dataTopicSelectionList"}

I have the following query, but I'm not sure if my METADATA struct is formatted correctly to produce the right output.

#standardSQL
-- Create the temporary table
CREATE OR REPLACE TEMP TABLE temp_table (
  selectionListId STRING,
  keys ARRAY>>,
  fileType STRING
);

-- Insert data into the temporary table
INSERT INTO temp_table (
  selectionListId,
  keys,
  fileType
)
SELECT 
  'SelectionListName' AS selectionListId,
  ARRAY>>    
    [
      STRUCT('EVENT_TYPE_CODE', 'STRING', []),
      STRUCT('EVENT_TYPE_DESCRIPTION', 'STRING', []),
      STRUCT('METADATA', 'STRUCT', 
        [STRUCT('CountedFrom', 'DATE', []),
         STRUCT('To_', 'DATE', []),
         STRUCT('Count', 'INT64', [])])
    ] AS keys,
  'dataTopicSelectionList' AS fileType
FROM (
  SELECT 
    Column1 AS EVENT_TYPE_CODE,
    Column2 AS EVENT_TYPE_DESCRIPTION,
    MIN(DATE(TransferLoc.EVENT_DTM)) AS CountedFrom,
    MAX(DATE(TransferLoc.EVENT_DTM)) AS To_,
    COUNT(TransferLoc.EVENT_DTM) AS Count
  FROM `projectID.TransferLoc` TransferLoc
  INNER JOIN `projectID.TransferEventType` TransferEventType
    ON TransferLoc.EVENT_TYPE_DK = TransferEventType.EVENT_TYPE_DK
  WHERE 
    TransferLoc.EVENT_DTM > '1900-01-01 00:00:00' AND 
    TransferLoc.EVENT_DTM < CURRENT_DATE()
  GROUP BY 
    TransferEventType.EVENT_TYPE_CODE,
    TransferEventType.EVENT_TYPE_DESCRIPTION
);

After the temp table is created, the schema is as follows:

However, on the insert I get this error:

Query error: Array element type STRUCT>>> does not coerce to STRUCT>

Any help/suggestion would be appreciated

I tried changing the METADATA structs. Placing the METADATA as an ARRAY>

D4M3 · Accepted Answer

I was able to get the correct schema to output the JSON representation correctly. I had to create 2 temp tables; along the way ran into some aggregation issues which was resolved by using the 2 tables and then join on temp. Here's my approach:

Step 1: Create temp table to prevent aggregation error

  CREATE OR REPLACE TEMP TABLE aggregated_data AS (
  SELECT     
    dataset1.EVENT_TYPE_DK,
    MIN(DATE(dataset1.EVENT_DTM)) AS `from`,
    MAX(DATE(dataset1.EVENT_DTM)) AS `to`,
    COUNT(dataset1.EVENT_DTM) AS `count`
  FROM `project1.dataset1` dataset1
  WHERE dataset1.EVENT_DTM > '1900-01-01 00:00:00'
    AND dataset1.EVENT_DTM < CURRENT_DATE()
  GROUP BY dataset1.EVENT_TYPE_DK
);

Step 2: Query the aggregated_data, then join

  CREATE OR REPLACE TEMP TABLE temp_table AS (
  SELECT   
    'SelectionListName' AS selectionListId,
    ARRAY_AGG(
      STRUCT( 
        dataset2.EVENT_TYPE_CODE, 
        dataset2.EVENT_TYPE_DESCRIPTION, 
        METADATA
      )
    ) AS keys,
    'dataTopicSelectionList' AS fileType
  FROM `project1.dataset2` dataset2
  JOIN (
    SELECT EVENT_TYPE_DK, ARRAY_AGG(STRUCT(`from`, `to`, `count`)) AS METADATA
    FROM aggregated_data
    GROUP BY EVENT_TYPE_DK
  ) AS aggregated_data_arr
    ON dataset2.EVENT_TYPE_DK = aggregated_data_arr.EVENT_TYPE_DK
  GROUP BY 1,3
);

Using BigQuery to format an output based on a JSON representation

Answers (1)

Related Questions