Giedrius
Giedrius

Reputation: 145

convert bigquery json string to columns

For json data that comes as a string I would like to have something like JSON_EXTRACT_SCALAR but for a flexible number of outcome columns.

Here is sample data - different rows can have different column names, and json can be nested:

WITH `my_table` AS (
  SELECT '{"sku_types":"{\"id\":\"5433306\",\"product_code\":\"adfklj_ewkj\"}","additional_info":"Face 30 ml","stock_level":"20+"}' as json_string 
  union all 
  SELECT '{"additional_info":"Face 100 ml","offer_info":"30%"}' as json_string 
)
SELECT * 
from my_table;

I would like to have this data extracted into separate columns: sku_types.id, sku_types.product_code, additional_info, stock_level, offer_info.

Can this be done in SQL or is javascript necessary?

I do not know in advance the names of json fields, so I was not able to do this using JSON_EXTRACT_SCALAR or JSON_EXTRACT.

Upvotes: 3

Views: 5512

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below example for BigQuery Standard SQL

#standardSQL
CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  var z = new Array();
  processKey(JSON.parse(y), '');
  function processKey(node, parent) {
    Object.keys(node).map(function(key) {
      value = node[key].toString();
      if (value !== '[object Object]') {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {
          z.push(parent + '.' + key + ':' + value)
        } else {
          z.push(key + ':' + value)
        }
      } else {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
        processKey(node[key], parent + key);
      };
    });         
  };
  return z
""";
WITH `my_table` AS (
  SELECT 1 id, '{"sku_types":{"id":"5433306","product_code":"adfklj_ewkj"},"additional_info":"Face 30 ml","stock_level":"20+"}' AS json_string UNION ALL 
  SELECT 2, '{"additional_info":"Face 100 ml","offer_info":"30%"}' AS json_string 
)
SELECT id, 
  ARRAY(
    SELECT AS STRUCT SPLIT(kv, ':')[OFFSET(0)] key, SPLIT(kv, ':')[SAFE_OFFSET(1)] value
    FROM UNNEST(parseJson(json_string)) kv
  ) params
FROM my_table

with result

Row id  params.key              params.value     
1   1   sku_types.id            5433306  
        sku_types.product_code  adfklj_ewkj  
        additional_info         Face 30 ml   
        stock_level             20+  
2   2   additional_info         Face 100 ml  
        offer_info              30%     

as you can see instead of parsing all possible attributes into separate columns (which is quite not possible here - unless you know them in advance) - above approach flattens them into key:value pairs inside params array

Note: in above example I am using : to construct key:value pairs and then split them. If you expect values have this char - you can adjust code and instead of : use something more unique - for example :::::::

Quick update to address comment:
... problem is that some of json values are null, in which case it throws and error

#standardSQL
CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  var z = new Array();
  processKey(JSON.parse(y), '');
  function processKey(node, parent) {
    Object.keys(node).map(function(key) {
      if (!node[key]) {
        value = 'n/a'
      } else {
        value = node[key].toString();
      }
      if (value !== '[object Object]') {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {
          z.push(parent + '.' + key + ':' + value)
        } else {
          z.push(key + ':' + value)
        }
      } else {
        if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
        processKey(node[key], parent + key);
      };
    });         
  };
  return z
""";
WITH `my_table` AS (
  SELECT 1 id, '{"sku_types":{"id":"5433306","product_code":"adfklj_ewkj"},"additional_info":"Face 30 ml","stock_level":"20+"}' AS json_string UNION ALL 
  SELECT 2, '{"additional_info":"Face 100 ml","offer_info":"30%"}' AS json_string union all
  SELECT 3 as id , '{"offer_info":"30%", "price":null}' AS json_string  
)
SELECT id, 
  ARRAY(
    SELECT AS STRUCT SPLIT(kv, ':')[OFFSET(0)] key, SPLIT(kv, ':')[SAFE_OFFSET(1)] value
    FROM UNNEST(parseJson(json_string)) kv
  ) params
FROM my_table  

with result

Row id  params.key              params.value     
1   1   sku_types.id            5433306  
        sku_types.product_code  adfklj_ewkj  
        additional_info         Face 30 ml   
        stock_level             20+  
2   2   additional_info         Face 100 ml  
        offer_info              30%  
3   3   offer_info              30%  
        price                   n/a    

As you can see her - I am replacing nulls with 'n/a' but you can apply whatever logic you want

Upvotes: 4

Related Questions