harlow
harlow

Reputation: 864

How to change the col type of a BigQuery repeated record

I'm trying to change a col type of a repeated record from STRING to TIMESTAMP. There are a few suggestions from BQ docs here (manually-changing-schemas). However, I'm running into issues with each of the recommended suggestions.

Here is an example schema:

{
  'name' => 'id',
  'type' => 'STRING',
  'mode' => 'REQUIRED'
},
{
  'name' => 'name',
  'type' => 'STRING',
  'mode' => 'REQUIRED'
},
// many more fields including nested records and repeated records
{
  'name' => 'locations',
  'type' => 'RECORD',
  'mode' => 'REPEATED',
  'fields' => [
    {
      'name' => 'city',
      'type' => 'STRING',
      'mode' => 'REQUIRED'
    },
    {
      'name' => 'updated_at',
      'type' => 'STRING',   // ** want this as TIMESTAMP **
      'mode' => 'REQUIRED'
    },
  ]
}

Issue with using Query:

I think we'd have to UNNEST the repeated record, cast the field to timestamp for each repeated record and then somehow recreate the row for insertion into the new table.

Issue with exporting table as JSON:

When exporting the table in JSON format it exports raw json representations of the data (with maps and dictionaries, as we'd expect it to).

However, we're not able to import that raw data back into BQ:

BigQuery does not support maps or dictionaries in JSON. For example, "product_categories": {"my_product": 40.0} is not valid, but "product_categories": {"column1": "my_product" , "column2": 40.0} is valid.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#limitations

Any suggestions would be much appreciated!

Upvotes: 2

Views: 3659

Answers (1)

Yun Zhang
Yun Zhang

Reputation: 5518

Below answer is based on that:REPEATED RECORD type in BigQuery StandardSQL is represented as type ARRAY<STRUCT<f1 f1_type, f2 f2_type ... >>.

It is not my favorite since you have to specify full column list. Maybe there is a better way.

#standardSQL
-- Build sample data, try to mimic what's in question.
CREATE OR REPLACE TABLE
  <your_dataset>.sample_table AS
SELECT name, 
       array<struct<city string, update_at string>>[("SFO", "2011-1-1"), ("SEA", "2022-2-2")] 
       as locations
FROM UNNEST(['Name1', "Name2", "Name3"]) as name;

enter image description here enter image description here

Then below SQL will convert update_at column into DATE and save to new table (or the same one if you'd like).

#standardSQL
CREATE OR REPLACE TABLE
  <your_dataset>.output_table AS
SELECT * REPLACE (
   ARRAY(SELECT AS STRUCT * REPLACE(CAST(update_at AS DATE) AS update_at)
         FROM UNNEST(locations)) 
   AS locations 
   )
FROM
  <your_dataset>.sample_table;

enter image description here

Upvotes: 7

Related Questions