Maykid
Maykid

Reputation: 517

Unnesting Multiple Columns in BigQuery

I'm having trouble with a UNNEST query that I'm working with. Below is an example of the query, the result I'm currently getting and the result I'm looking to get from it.

A little bit of context, the upload that I'm currently doing is spacing out the ID's with A instead of the , forcing it as a String instead of a Number as there are multiple ID's within the same cell. The Price is separated out by , as well. Example of the data that is being uploaded:

    Name    |    Date    |   Item_ID   |  Price
    John    |  4/17/2020 | 123A456A678 | 19.99,21.99,30.00
    Joe     |  4/17/2020 | 555A777A888 | 8.99,10.00,15.99
    Jake    |  4/18/2020 |   444A333   | 15.99,9.00
    John    |  4/18/2020 |     432     | 75.99
    Megan   |  4/18/2020 | 12A890A23A99| 5.99,6.99,9.99,10.00

This is a sample of what the data looks like in the table before trying to UNNEST. The example of the current UNNEST query is below with a Sample of the output.

With data AS(
  SELECT
    Name,
    Date,
    SPLIT(Item_ID, 'A') AS Item_ID_Split,
    SPLIT(Price, ',') AS Price_Split
FROM
  Example.Table
SELECT
  Name,
  Date,
  Item_ID_Split,
  Price_Split
FROM data,
UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
UNNEST(Price_Split) Price_Split WITH OFFSET pos2

The current output looks something like the following:

    Name   |    Date   |  Item_ID_Split | Price_Split
    John   | 4/17/2020 |      123       |   19.99
    John   | 4/17/2020 |      456       |   19.99
    John   | 4/17/2020 |      678       |   19.99
    John   | 4/17/2020 |      123       |   21.99
    John   | 4/17/2020 |      456       |   21.99
    John   | 4/17/2020 |      678       |   21.99
    John   | 4/17/2020 |      123       |   30.00
    John   | 4/17/2020 |      456       |   30.00
    John   | 4/17/2020 |      678       |   30.00
    Joe    | 4/17/2020 |      555       |   8.99
    Joe    | 4/17/2020 |      777       |   8.99
    Joe    | 4/17/2020 |      888       |   8.99
    Joe    | 4/17/2020 |      555       |   10.00
    Joe    | 4/17/2020 |      777       |   10.00
    Joe    | 4/17/2020 |      888       |   10.00
    Joe    | 4/17/2020 |      555       |   15.99
    Joe    | 4/17/2020 |      777       |   15.99
    Joe    | 4/17/2020 |      888       |   15.99
    Jake   | 4/18/2020 |      444       |   15.99
    Jake   | 4/18/2020 |      333       |   15.99
    Jake   | 4/18/2020 |      444       |   9.00
    Jake   | 4/18/2020 |      333       |   9.00
    John   | 4/18/2020 |      432       |   75.99
    Megan  | 4/18/2020 |      12        |   5.99
    Megan  | 4/18/2020 |      890       |   5.99
    Megan  | 4/18/2020 |      23        |   5.99
    Megan  | 4/18/2020 |      99        |   5.99
    Megan  | 4/18/2020 |      12        |   6.99
    Megan  | 4/18/2020 |      890       |   6.99
    Megan  | 4/18/2020 |      23        |   6.99
    Megan  | 4/18/2020 |      99        |   6.99
    Megan  | 4/18/2020 |      12        |   9.99
    Megan  | 4/18/2020 |      890       |   9.99
    Megan  | 4/18/2020 |      23        |   9.99
    Megan  | 4/18/2020 |      99        |   9.99
    Megan  | 4/18/2020 |      12        |   10.00
    Megan  | 4/18/2020 |      890       |   10.00
    Megan  | 4/18/2020 |      23        |   10.00
    Megan  | 4/18/2020 |      99        |   10.00

This is the current output from the query above. As you can see there are duplicate Item_IDs/Prices, what I want to have the result be is the following:

    Name   |    Date   |  Item_ID_Split | Price_Split
    John   | 4/17/2020 |      123       |   19.99
    John   | 4/17/2020 |      456       |   21.99
    John   | 4/17/2020 |      678       |   30.00
    Joe    | 4/17/2020 |      555       |   8.99
    Joe    | 4/17/2020 |      777       |   10.00
    Joe    | 4/17/2020 |      888       |   15.99
    Jake   | 4/18/2020 |      444       |   15.99
    Jake   | 4/18/2020 |      333       |   9.00
    John   | 4/18/2020 |      432       |   75.99
    Megan  | 4/18/2020 |      12        |   5.99
    Megan  | 4/18/2020 |      890       |   6.99
    Megan  | 4/18/2020 |      23        |   9.99
    Megan  | 4/18/2020 |      99        |   10.00

This is the result that I'm looking for where there is no duplication at all between Item_ID_Split and Price_Split. I've tried to put the SPLIT function within the UNNEST but I get the same output. I'm not entirely sure how to accomplish this so any help would be much appreciated!

Thank you in advance!

Upvotes: 0

Views: 3323

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

#standardSQL
SELECT Name, Day, Splits.*
FROM (
  SELECT Name, Day, 
    ARRAY(
      SELECT AS STRUCT Item_ID_Split, Price_Split
      FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
      JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
      USING(OFFSET)
    ) AS arr
  FROM `project.dataset.table`
), UNNEST(arr) Splits   

If to apply to sample data form your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'John' Name, '4/17/2020' Day, '123A456A678' Item_ID,'19.99,21.99,30.00' Price UNION ALL
  SELECT 'Joe', '4/17/2020', '555A777A888','8.99,10.00,15.99' UNION ALL
  SELECT 'Jake', '4/18/2020', '444A333','15.99,9.00' UNION ALL
  SELECT 'John', '4/18/2020', '432','75.99' UNION ALL
  SELECT 'Megan', '4/18/2020', '12A890A23A99','5.99,6.99,9.99,10.00' 
)
SELECT Name, Day, Splits.*
FROM (
  SELECT Name, Day, 
    ARRAY(
      SELECT AS STRUCT Item_ID_Split, Price_Split
      FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
      JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
      USING(OFFSET)
    ) AS arr
  FROM `project.dataset.table`
), UNNEST(arr) Splits   

the output is

Row Name    Day         Item_ID_Split   Price_Split  
1   John    4/17/2020   123             19.99    
2   John    4/17/2020   456             21.99    
3   John    4/17/2020   678             30.00    
4   Joe     4/17/2020   555             8.99     
5   Joe     4/17/2020   777             10.00    
6   Joe     4/17/2020   888             15.99    
7   Jake    4/18/2020   444             15.99    
8   Jake    4/18/2020   333             9.00     
9   John    4/18/2020   432             75.99    
10  Megan   4/18/2020   12              5.99     
11  Megan   4/18/2020   890             6.99     
12  Megan   4/18/2020   23              9.99     
13  Megan   4/18/2020   99              10.00   

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269563

You can use with offset:

SELECT Name, Date, Item_ID_Split, Price_Split
FROM data LEFT JOIN
     UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
     ON 1=1 LEFT JOIN
     UNNEST(Price_Split) Price_Split WITH OFFSET pos2
     ON pos1 = po2;

Upvotes: 4

Related Questions