mikebridge
mikebridge

Reputation: 4585

Can I convert a stringified JSON array back to a BigQuery strucutre?

I'm trying to take a STRING field that contains a nested JSON structure from a table called my_old_table, extract a nested array called "alerts" from it, then insert it into a column in a new table called my_new_table. The new column is defined as:

ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>

I'm using this SQL:

INSERT INTO my_dataset.my_table(
   id, alerts)
SELECT id, JSON_EXTRACT(extra, "$.alerts") AS content_alerts
FROM my_dataset.my_old_table

This gives me:

Query column 2 has type STRING which cannot be inserted into column content_alerts, which has type ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> at [4:1]

I don't see a way of parsing the extracted string this back to a structure.... Is there another way to do this?

Edit:

The original value is a json string that looks like this:

{
  "id": "bar123",
  "value": "Test",
  "title": "Test",
  "alerts": [
    {
      "id": "abc123",
      "title": "Foo",
      "created": "2020-01-17T23:18:59.769908Z"
    },
    {
      "id": "abc124",
      "title": "Accepting/Denying Claims",
      "created": "2020-01-17T23:18:59.769908Z"
    }
  ]
}

I want to extract $.alerts and insert it into the ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> somehow.

Edit #2

To clarify, this reproduces the issue:

CREATE TABLE insights.my_table
(
  id string,
  alerts ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
);

CREATE TABLE insights.my_old_table
(
   id string,
   field STRING
);

INSERT INTO insights.my_old_table(id, field)
VALUES("1", "{\"id\": \"bar123\",\"value\": \"Test\",\"title\": \"Test\",\"alerts\":[{\"id\": \"abc123\",\"title\": \"Foo\",\"created\": \"2020-01-17T23:18:59.769908Z\"},{\"id\": \"abc124\",\"title\": \"Accepting/Denying Claims\",\"created\": \"2020-01-17T23:18:59.769908Z\"}]}");

Based on the above setup, I don't know how to extract "alerts" from the STRING field and insert it into the STRUCT field. I thought I could add a JSON PARSE step in there but I don't see any BigQuery feature for that. Or else there would be a way to manipulate JSON as a STRUCT but I don't see that either. As a result, this is as close as I could get:

INSERT INTO insights.my_table(id, alerts)
SELECT id, JSON_EXTRACT(field, "$.alerts") AS alerts FROM insights.my_old_table

I'm sure there's something I'm missing here.

Upvotes: 2

Views: 980

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Below for BigQuery Standard SQL

#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(input).map(x=>JSON.stringify(x));
"""; 
)
SELECT 
  JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
      JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
      TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
    FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert 
  ) AS alerts,
FROM `project.dataset.my_old_table`   

You can test, play with above using sample data from your question as in example below

#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(input).map(x=>JSON.stringify(x));
"""; 
WITH `project.dataset.my_old_table` AS (
  SELECT '''
{
  "id": "bar123",
  "value": "Test",
  "title": "Test",
  "alerts": [
    {
      "id": "abc123",
      "title": "Foo",
      "created": "2020-01-17T23:18:59.769908Z"
    },
    {
      "id": "abc124",
      "title": "Accepting/Denying Claims",
      "created": "2020-01-17T23:18:59.769908Z"
    }
  ]
}
''' extra
)
SELECT 
  JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
      JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
      TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
    FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert 
  ) AS alerts,
FROM `project.dataset.my_old_table`    

with result

enter image description here

Obviously, you can then use this in your INSERT INTO my_dataset.my_table statement

Upvotes: 2

Ksign
Ksign

Reputation: 817

You can parse the extracted string back to a BigQuery structure like so:

SELECT STRUCT(ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
[('Rick', 'Scientist', '2020-01-17')]) FROM my_dataset.my_old_table;


I just tried it with your data

I have inserted your data in a BigQuery table:

INSERT INTO dataset.table
  VALUES('{"id": "bar123", "value": "Test", "title": "Test", "alerts":
 [{ "id": "abc123", "title": "Foo", "created": "2020-01-17T23:18:59.769908Z"},
 {"id": "abc124", "title": "Accepting/Denying Claims", "created": "2020-01-17T23:18:59.769908Z"}]}');

and queried it, converting it back to a BigQuery structure:

SELECT STRUCT<cuid STRING, title STRING, created TIMESTAMP>("abc123",
 "Foo", "2020-01-17T23:18:59.769908Z"),("abc124", "Accepting/Denying 
 Claims", "2020-01-17T23:18:59.769908Z") FROM blabla.testingjson;

Output:

Row | f0_.cuid | f0_.title | f0_.created 
----------------------------------------
1   | abc123   | Foo       | 2020-01-17 23:18:59.769908 UTC

Upvotes: 0

Related Questions