Bahadır Bulut
Bahadır Bulut

Reputation: 13

How to create array of nested fields and arrays in BigQuery

I am trying to create a table in BigQuery according to a json schema which I will put in GCS and push to a pub/sub topic from there. I need to create some arrays and nested fields in order to achieve that.

By using struct and array_agg I can achieve arrays of struct but I couldn't figure out how to create struct of array.

Imagine that I have a json schema as below:

{
  "vacancies": {
    "id": "12",
    "timestamp": "2019-08-22T04:04:26Z",
    "version": "1.0",
    "positionOpening": {
      "documentId": {
        "value": "505"
      },
      "statusCode": "Closed",
      "registrationDate": "2014-05-07T16:11:22Z",
      "lastUpdated": "2014-05-07T16:14:56Z",
      "positionProfiles": [
        {
        "positionTitle": "Data Scientist for international company",
        "positionQualifications": [
          {
            "experienceSummary": [
              {"measure": {"value": "10","unitCode": "ANN"}},
              {"measure": {"value": "4","unitCode": "ANN"}}
            ],
            "educationRequirement": {
              "programs": ["Physics","Computer Science"],
              "programConcentrations": ["Data Analysis","Python Programming"]
            },
            "languageRequirement": [
              {
                "competencyName": "English",
                "requiredProficiencyLevel": {"scoresNumeric": [{"value": "100"},{"value": "95"}]}
              },
              {
                "competencyName": "French",
                "requiredProficiencyLevel": {"scoresNumeric": [{"value": "95"},{"value": "70"}]}
              }
            ]
          }
        ]
        }
      ]
    }
  }
}

How can I create a SQL query to get this as a result?

Thanks in advance for the help!

Upvotes: 1

Views: 4544

Answers (1)

tb.
tb.

Reputation: 788

You might have to build a temp table to do this.

This first create statement would take a denormalized table convert it to a table with an array of structs.

The second create statement would take that temp table and embed the array into a (array of) struct(s).

You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.

The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.

 CREATE OR REPLACE TABLE dataset.tempTable as (
     select 
         column1,
         column2,
         column3,
         ARRAY_AGG(
                 STRUCT(
                     ArrayObjectColumn1,
                     ArrayObjectColumn2,
                     ArrayObjectColumn3
                 )
             ) as InsideArrayOfStructs
     FROM
         sourceDataset.sourceTable
     GROUP BY 
         column1,
         column2,
         column3 )

 CREATE OR REPLACE TABLE dataset.finalTable as (
     select 
         column1,
         ARRAY_AGG(
                 STRUCT(
                     column2,
                     column3,
                     InsideArrayOfStructs
                 )
        ) as OutsideArrayOfStructs
     FROM
         dataset.tempTable
     GROUP BY
         Column1 )

Upvotes: 0

Related Questions