user2135521
user2135521

Reputation: 81

How to create nested JSON format table from flat table in BigQuery?

I have a wide flat table, stored in Google bigquery in the folowing similar format :

log_date:integer,sessionid:integer,computer:string,ip:string,event_id:integer,amount:float

I'm trying to create this table in hierarchical nested format , having 2 nested levels , as following :

 [
  {
    "name": "log_date",
    "type": "integer"
  }, 
  {
    "name": "session",
    "type": "record",
    "mode": "repeated",
    "fields": [                 
     {
       "name": "sessionid",
       "type": "integer"
         },
     {
       "name": "computer",
       "type": "string"
        },
        {
       "name": "ip",
       "type": "string"
        },
        {
    "name": "event",
    "type": "record",
    "mode": "repeated",
    "fields": [
    {
       "name": "event_id",
       "type": "integer"
     },
     {
       "name": "amount",
       "type": "float"
     }]] } ]

What is the best way to generate the json formatted data file from bigquery table ? Is there a different and faster approach than 1. download the table into external csv 2. build the json record , and write it into external file 3. upload the external json file into new bigquery table

Can we have a direct process that generates json from existing tables ?

Thank you , H

Upvotes: 5

Views: 2521

Answers (2)

Steven Ensslen
Steven Ensslen

Reputation: 1396

This can be accomplished with array_agg in standard SQL.

Note that if you want to nest in layers there need to be common table expressions as an array_agg can not directly contain another array_agg.

WITH DATA AS (
 SELECT 1 AS log_date, 10 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 100 AS event_id, 1 AS amount
 UNION ALL SELECT 1 AS log_date, 11 AS sessionid, 'b' AS computer, '1.2.3.5' AS ip, 101 AS event_id, 2 AS amount
 UNION ALL SELECT 1 AS log_date, 11 AS sessionid, 'b' AS computer, '1.2.3.5' AS ip, 102 AS event_id, 3 AS amount
 UNION ALL SELECT 2 AS log_date, 20 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 200 AS event_id, 4 AS amount
 UNION ALL SELECT 2 AS log_date, 20 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 201 AS event_id, 5 AS amount
 UNION ALL SELECT 2 AS log_date, 21 AS sessionid, 'c' AS computer, '1.2.3.6' AS ip, 202 AS event_id, 6 AS amount ),
inner_Aggregate AS (
  SELECT
    log_date,
    sessionid,
    computer,
    ip,
    ARRAY_AGG(STRUCT(event_id, amount)) AS event
  FROM
    DATA
  GROUP BY
    log_date,
    sessionid,
    computer,
    ip )
SELECT
  log_date,
  ARRAY_AGG(STRUCT(sessionid, computer, ip, event )) AS session
FROM
  inner_Aggregate
GROUP BY
  log_date

Upvotes: 0

Jordan Tigani
Jordan Tigani

Reputation: 26637

There isn't currently a way to automatically transform the data to a nested format. If you'd like to get the data out in json format rather than CSV, you can use the export commend with the --destination_format flag set to NEWLINE_DELIMITED_JSON. e.g.

bq extract \
    --destination_format=NEWLINE_DELIMITED_JSON \
    yourdataset.table \
    gs://your_bucket/result*.json 

Upvotes: 1

Related Questions