Kyle
Kyle

Reputation: 63

How to parse list of jsons in BigQuery

I've got a list of jsons associated to an ID that I need to parse out in BigQuery. Not too familiar with creating UDFs or Javascript but here's my attempt

The data is structured in two columns, one being ID and the second a json string: [{"date":"2019-12-07","amount":10},{"date":"2019-12-08","amount":20,{"date":"2019-12-09","amount":30},{"date":"2019-12-10","amount":40}]

My ideal output is something like:

id        date     amount
10     2019-12-07    10
10     2019-12-08    20
10     2019-12-09    30
10     2019-12-10    40

Here is my SQL so far:

#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS STRUCT<date DATE>
LANGUAGE js AS """
return JSON.parse(input);
""";

select
  t.id,
  trim(t.inputs,'[]')
from test t

Any help is much appreciated!

Upvotes: 0

Views: 918

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59325

If you have well defined strings, using SQL-only will give you faster and more scalable results:

CREATE TEMP FUNCTION sql_parse(str STRING) AS ((
  SELECT ARRAY_AGG(STRUCT(date, amount))
  FROM (
    SELECT REGEXP_EXTRACT_ALL(str, '"date":"([-0-9]*)') dates, REGEXP_EXTRACT_ALL(str, '"amount":([0-9]*)') amounts
  ), UNNEST(dates) date WITH OFFSET i, UNNEST(amounts) amount WITH OFFSET j WHERE i=j
));


WITH input AS (
SELECT 10 id, '[{"date":"2019-12-07","amount":10},{"date":"2019-12-08","amount":20,{"date":"2019-12-09","amount":30},{"date":"2019-12-10","amount":40}]' x)

SELECT id, date, amount
FROM input, UNNEST(sql_parse(x));

enter image description here

For better JSON-arrays support in BigQuery, upvote this:

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Below is for BigQuery Standard SQL

#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(input).map(x=>JSON.stringify(x));
"""; 
WITH `project.dataset.table` AS (
  SELECT 1 id, '[{"date":"2019-12-07","amount":10},{"date":"2019-12-08","amount":20},{"date":"2019-12-09","amount":30},{"date":"2019-12-10","amount":40}]' inputs
)
SELECT
  id,
  JSON_EXTRACT_SCALAR(item, '$.date') `date`,
  JSON_EXTRACT_SCALAR(item, '$.amount') amount
FROM `project.dataset.table`, UNNEST(JsonToItems(inputs)) item

with output

Row id  date        amount   
1   1   2019-12-07  10   
2   1   2019-12-08  20   
3   1   2019-12-09  30   
4   1   2019-12-10  40   

Upvotes: 0

Related Questions