Reputation: 63
I've got a list of jsons associated to an ID that I need to parse out in BigQuery. Not too familiar with creating UDFs or Javascript but here's my attempt
The data is structured in two columns, one being ID and the second a json string:
[{"date":"2019-12-07","amount":10},{"date":"2019-12-08","amount":20,{"date":"2019-12-09","amount":30},{"date":"2019-12-10","amount":40}]
My ideal output is something like:
id date amount
10 2019-12-07 10
10 2019-12-08 20
10 2019-12-09 30
10 2019-12-10 40
Here is my SQL so far:
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS STRUCT<date DATE>
LANGUAGE js AS """
return JSON.parse(input);
""";
select
t.id,
trim(t.inputs,'[]')
from test t
Any help is much appreciated!
Upvotes: 0
Views: 918
Reputation: 59325
If you have well defined strings, using SQL-only will give you faster and more scalable results:
CREATE TEMP FUNCTION sql_parse(str STRING) AS ((
SELECT ARRAY_AGG(STRUCT(date, amount))
FROM (
SELECT REGEXP_EXTRACT_ALL(str, '"date":"([-0-9]*)') dates, REGEXP_EXTRACT_ALL(str, '"amount":([0-9]*)') amounts
), UNNEST(dates) date WITH OFFSET i, UNNEST(amounts) amount WITH OFFSET j WHERE i=j
));
WITH input AS (
SELECT 10 id, '[{"date":"2019-12-07","amount":10},{"date":"2019-12-08","amount":20,{"date":"2019-12-09","amount":30},{"date":"2019-12-10","amount":40}]' x)
SELECT id, date, amount
FROM input, UNNEST(sql_parse(x));
For better JSON-arrays support in BigQuery, upvote this:
Upvotes: 1
Reputation: 173171
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.table` AS (
SELECT 1 id, '[{"date":"2019-12-07","amount":10},{"date":"2019-12-08","amount":20},{"date":"2019-12-09","amount":30},{"date":"2019-12-10","amount":40}]' inputs
)
SELECT
id,
JSON_EXTRACT_SCALAR(item, '$.date') `date`,
JSON_EXTRACT_SCALAR(item, '$.amount') amount
FROM `project.dataset.table`, UNNEST(JsonToItems(inputs)) item
with output
Row id date amount
1 1 2019-12-07 10
2 1 2019-12-08 20
3 1 2019-12-09 30
4 1 2019-12-10 40
Upvotes: 0