am_am
am_am

Reputation: 239

Bigquery - json_extract all elements from an array

i'm trying to extract two key from every json in an arry of jsons(using sql legacy) currently i am using json extract function :

json_extract(json_column , '$[1].X') AS X,
json_extract(json_column , '$[1].Y') AS Y,

how can i make it run on every json at the 'json arry column', and not just [1] (for example)?

An example json:

[

{"blabla":000,"X":1,"blabla":000,"blabla":000,"blabla":000,,"Y":"2"},

{"blabla":000,"X":3,"blabla":000,"blabla":000,"blabla":000,,"Y":"4"},

]   

thanks in advance!

Upvotes: 21

Views: 62160

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

Update 2020: JSON_EXTRACT_ARRAY()

Now BigQuery supports JSON_EXTRACT_ARRAY():

For example, to solve this particular question:

SELECT id
  , ARRAY(
      SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') 
      FROM UNNEST(JSON_EXTRACT_ARRAY(payload, "$.commits"))x
  ) emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

enter image description here


Previous answer

Let's start with a similar problem - this is not a very convenient way to extract all emails from a json array:

SELECT id
  , [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')
    ] emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

enter image description here

The best way we have right now to deal with this is to use some JavaScript in an UDF to split a json-array into a SQL array:

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
"""; 

SELECT * EXCEPT(array_commits),
  ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) emails
FROM (
  SELECT id
    , json2array(JSON_EXTRACT(payload, '$.commits')) array_commits
  FROM `githubarchive.day.20180830` 
  WHERE type='PushEvent' 
  AND id='8188163772'
)

enter image description here

Upvotes: 47

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

May 1st, 2020 Update

A new function, JSON_EXTRACT_ARRAY, has been just added to the list of JSON functions. This function allows you to extract the contents of a JSON document as a string array.

so in below you can replace use of CUSTOM_JSON_EXTRACT UDF with just in-built function JSON_EXTRACT_ARRAY as in below example

#standardSQL
SELECT 
  JSON_EXTRACT_SCALAR(json , '$.X') AS X,
  JSON_EXTRACT_SCALAR(json , '$.Y') AS Y
FROM t, UNNEST(JSON_EXTRACT_ARRAY(json_column , '$')) json   

==============

Below example for BigQuery Standard SQL and allows you to be close to standard way of working with JSONPath and no extra manipulation needed so you just simply use CUSTOM_JSON_EXTRACT(json, json_path) function

#standardSQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
        return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
    library="gs://your_bucket/jsonpath-0.8.0.js"
);
WITH t AS (
SELECT '''
[
{"blabla1":1,"X":1,"blabla2":3,"blabla3":5,"blabla4":7,"Y":"2"},
{"blabla1":2,"X":3,"blabla2":4,"blabla3":6,"blabla4":8,"Y":"4"}
]   
''' AS json_column 
)
SELECT 
  CUSTOM_JSON_EXTRACT(json_column , '$[*].X') AS X,
  CUSTOM_JSON_EXTRACT(json_column , '$[*].Y') AS Y
FROM t   

result will be

Row X   Y    
1   1   2    
    3   4      

Note: to overcome current BigQuery's "limitation" for JsonPath, above solution uses custom function along with external library - jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js

Just re-read Felipe's answer - for his example above solution will look like below (just as FYI)

SELECT 
  id, 
  CUSTOM_JSON_EXTRACT(payload, '$.commits[*].author.email') emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

Upvotes: 14

Related Questions