OUMOUSS_ELMEHDI
OUMOUSS_ELMEHDI

Reputation: 519

bigquery unnest values in array depending on case

I have a very special data table that i want to transform for a visualisation purpose (see image). I am aware of the cross join unnest which does the equivalent of a cartesian product, however in this case i need to reduplicate depending on the value next to the *

enter image description here

Upvotes: 0

Views: 2652

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Below is yet another simple option for BigQuery Standard SQL (based on use of REPEAT function)

#standardSQL
SELECT id, value
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(t.values, r'\[|]', ''))) x,
UNNEST(SPLIT(REPEAT(
  CONCAT(',', SPLIT(x, '*')[OFFSET(0)]), 
  IFNULL(CAST(SPLIT(x, '*')[SAFE_OFFSET(1)] AS INT64), 1)
))) value
WHERE value != ''  

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 AS id, '[5*2,8,6]' AS `values` UNION ALL
  SELECT 2, '[5*2,0*3]' UNION ALL
  SELECT 3, '[1*1,2,5,6]'
)
SELECT id, value
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(t.values, r'\[|]', ''))) x,
UNNEST(SPLIT(REPEAT(
  CONCAT(',', SPLIT(x, '*')[OFFSET(0)]), 
  IFNULL(CAST(SPLIT(x, '*')[SAFE_OFFSET(1)] AS INT64), 1)
))) value
WHERE value != ''   

with result

Row id  value    
1   1   5    
2   1   5    
3   1   8    
4   1   6    
5   2   5    
6   2   5    
7   2   0    
8   2   0    
9   2   0    
10  3   1    
11  3   2    
12  3   5    
13  3   6      

Upvotes: 1

Elliott Brossard
Elliott Brossard

Reputation: 33755

Here is a solution using a SQL UDF, alternatively. This should be faster since it avoids the latency of using the JavaScript v8 sandbox:

CREATE TEMP FUNCTION ExpandList(input STRING) AS (
  ARRAY(
    -- Find the value before the *
    SELECT SPLIT(elem, '*')[OFFSET(0)]
    -- For each comma-separated element inside the braces
    FROM UNNEST(REGEXP_EXTRACT_ALL(input, r'[^\[\],]+')) AS elem,
    -- Repeated by the value after the *, or once if there is no *
    UNNEST(GENERATE_ARRAY(1, IFNULL(CAST(SPLIT(elem, '*')[SAFE_OFFSET(1)] AS INT64), 1))))
);

WITH Input AS (
  SELECT 1 AS id, '[5*2,8,6]' AS values UNION ALL
  SELECT 2, '[5*2,0*3]' UNION ALL
  SELECT 3, '[1,2,5,6]'
)
SELECT id, value
FROM Input,
UNNEST(ExpandList(values)) AS value;

Upvotes: 3

Tamir Klein
Tamir Klein

Reputation: 3642

Below is an SQL example with a TEMP function which demonstrate how you can flatten your array

CREATE TEMP FUNCTION flatten(input ARRAY<STRING>)
RETURNS ARRAY<STRING>
LANGUAGE js AS """

        let flatten = []
        for (let j = 0; j < input.length; j++) {
            if (input[j].indexOf('*') === -1) {
                flatten.push(input[j])
            } else {
                let prefix = input[j].split('*')[1]
                let value = input[j].split('*')[0]

                for (let i = 0; i < prefix; i++) {
                    flatten.push(value)
                }
            }
        }
        return flatten
""";


WITH numbers AS
  (SELECT ['5*2','8','6'] as value
 )
SELECT flatten(value) as product
FROM numbers;

The output of this call as requested is:

enter image description here

Upvotes: 2

Related Questions