Reputation: 517
I'm having trouble with a UNNEST query that I'm working with. Below is an example of the query, the result I'm currently getting and the result I'm looking to get from it.
A little bit of context, the upload that I'm currently doing is spacing out the ID's
with A
instead of the ,
forcing it as a String instead of a Number as there are multiple ID's within the same cell. The Price
is separated out by ,
as well. Example of the data that is being uploaded:
Name | Date | Item_ID | Price
John | 4/17/2020 | 123A456A678 | 19.99,21.99,30.00
Joe | 4/17/2020 | 555A777A888 | 8.99,10.00,15.99
Jake | 4/18/2020 | 444A333 | 15.99,9.00
John | 4/18/2020 | 432 | 75.99
Megan | 4/18/2020 | 12A890A23A99| 5.99,6.99,9.99,10.00
This is a sample of what the data looks like in the table before trying to UNNEST. The example of the current UNNEST query is below with a Sample of the output.
With data AS(
SELECT
Name,
Date,
SPLIT(Item_ID, 'A') AS Item_ID_Split,
SPLIT(Price, ',') AS Price_Split
FROM
Example.Table
SELECT
Name,
Date,
Item_ID_Split,
Price_Split
FROM data,
UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
UNNEST(Price_Split) Price_Split WITH OFFSET pos2
The current output looks something like the following:
Name | Date | Item_ID_Split | Price_Split
John | 4/17/2020 | 123 | 19.99
John | 4/17/2020 | 456 | 19.99
John | 4/17/2020 | 678 | 19.99
John | 4/17/2020 | 123 | 21.99
John | 4/17/2020 | 456 | 21.99
John | 4/17/2020 | 678 | 21.99
John | 4/17/2020 | 123 | 30.00
John | 4/17/2020 | 456 | 30.00
John | 4/17/2020 | 678 | 30.00
Joe | 4/17/2020 | 555 | 8.99
Joe | 4/17/2020 | 777 | 8.99
Joe | 4/17/2020 | 888 | 8.99
Joe | 4/17/2020 | 555 | 10.00
Joe | 4/17/2020 | 777 | 10.00
Joe | 4/17/2020 | 888 | 10.00
Joe | 4/17/2020 | 555 | 15.99
Joe | 4/17/2020 | 777 | 15.99
Joe | 4/17/2020 | 888 | 15.99
Jake | 4/18/2020 | 444 | 15.99
Jake | 4/18/2020 | 333 | 15.99
Jake | 4/18/2020 | 444 | 9.00
Jake | 4/18/2020 | 333 | 9.00
John | 4/18/2020 | 432 | 75.99
Megan | 4/18/2020 | 12 | 5.99
Megan | 4/18/2020 | 890 | 5.99
Megan | 4/18/2020 | 23 | 5.99
Megan | 4/18/2020 | 99 | 5.99
Megan | 4/18/2020 | 12 | 6.99
Megan | 4/18/2020 | 890 | 6.99
Megan | 4/18/2020 | 23 | 6.99
Megan | 4/18/2020 | 99 | 6.99
Megan | 4/18/2020 | 12 | 9.99
Megan | 4/18/2020 | 890 | 9.99
Megan | 4/18/2020 | 23 | 9.99
Megan | 4/18/2020 | 99 | 9.99
Megan | 4/18/2020 | 12 | 10.00
Megan | 4/18/2020 | 890 | 10.00
Megan | 4/18/2020 | 23 | 10.00
Megan | 4/18/2020 | 99 | 10.00
This is the current output from the query above. As you can see there are duplicate Item_IDs/Prices, what I want to have the result be is the following:
Name | Date | Item_ID_Split | Price_Split
John | 4/17/2020 | 123 | 19.99
John | 4/17/2020 | 456 | 21.99
John | 4/17/2020 | 678 | 30.00
Joe | 4/17/2020 | 555 | 8.99
Joe | 4/17/2020 | 777 | 10.00
Joe | 4/17/2020 | 888 | 15.99
Jake | 4/18/2020 | 444 | 15.99
Jake | 4/18/2020 | 333 | 9.00
John | 4/18/2020 | 432 | 75.99
Megan | 4/18/2020 | 12 | 5.99
Megan | 4/18/2020 | 890 | 6.99
Megan | 4/18/2020 | 23 | 9.99
Megan | 4/18/2020 | 99 | 10.00
This is the result that I'm looking for where there is no duplication at all between Item_ID_Split
and Price_Split
. I've tried to put the SPLIT
function within the UNNEST
but I get the same output. I'm not entirely sure how to accomplish this so any help would be much appreciated!
Thank you in advance!
Upvotes: 0
Views: 3323
Reputation: 172974
Below is for BigQuery Standard SQL
#standardSQL
SELECT Name, Day, Splits.*
FROM (
SELECT Name, Day,
ARRAY(
SELECT AS STRUCT Item_ID_Split, Price_Split
FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
USING(OFFSET)
) AS arr
FROM `project.dataset.table`
), UNNEST(arr) Splits
If to apply to sample data form your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'John' Name, '4/17/2020' Day, '123A456A678' Item_ID,'19.99,21.99,30.00' Price UNION ALL
SELECT 'Joe', '4/17/2020', '555A777A888','8.99,10.00,15.99' UNION ALL
SELECT 'Jake', '4/18/2020', '444A333','15.99,9.00' UNION ALL
SELECT 'John', '4/18/2020', '432','75.99' UNION ALL
SELECT 'Megan', '4/18/2020', '12A890A23A99','5.99,6.99,9.99,10.00'
)
SELECT Name, Day, Splits.*
FROM (
SELECT Name, Day,
ARRAY(
SELECT AS STRUCT Item_ID_Split, Price_Split
FROM UNNEST(SPLIT(Item_ID, 'A')) AS Item_ID_Split WITH OFFSET
JOIN UNNEST(SPLIT(Price, ',')) AS Price_Split WITH OFFSET
USING(OFFSET)
) AS arr
FROM `project.dataset.table`
), UNNEST(arr) Splits
the output is
Row Name Day Item_ID_Split Price_Split
1 John 4/17/2020 123 19.99
2 John 4/17/2020 456 21.99
3 John 4/17/2020 678 30.00
4 Joe 4/17/2020 555 8.99
5 Joe 4/17/2020 777 10.00
6 Joe 4/17/2020 888 15.99
7 Jake 4/18/2020 444 15.99
8 Jake 4/18/2020 333 9.00
9 John 4/18/2020 432 75.99
10 Megan 4/18/2020 12 5.99
11 Megan 4/18/2020 890 6.99
12 Megan 4/18/2020 23 9.99
13 Megan 4/18/2020 99 10.00
Upvotes: 1
Reputation: 1269563
You can use with offset
:
SELECT Name, Date, Item_ID_Split, Price_Split
FROM data LEFT JOIN
UNNEST(Item_ID_Split) Item_ID_Split WITH OFFSET pos1
ON 1=1 LEFT JOIN
UNNEST(Price_Split) Price_Split WITH OFFSET pos2
ON pos1 = po2;
Upvotes: 4