Reputation: 1190
I have a table in BigQuery where each ID has multiple rows, many with Null values. I want to combine the rows to create a complete set.
Here is a sample set of data:
ID Address CreatedDate City
1 1 1st Street NULL New York City
1 NULL 8/18/17 NULL
2 NULL 8/13/17 Boston
2 2 2nd Street NULL NULL
3 3 3rd Street 8/1/17 Los Angeles
3 NULL NULL NULL
3 NULL 8/7/17 NULL
Here is the expected Output:
ID Address CreatedDate City
1 1 1st Street 8/18/17 New York City
2 2 2nd Street 8/13/17 Boston
3 3 3rd Street 8/1/17 Los Angeles
Honestly I'm not sure this is even possible, but any suggestions would be greatly appreciated!
Upvotes: 2
Views: 3398
Reputation: 11787
As another possibility, you could solve this one using the function ARRAY_AGG available in the Standard SQL version of BigQuery (highly recommended that you use standard).
As for the query:
#standardSQL
SELECT
ID,
ARRAY_AGG(Address IGNORE NULLS LIMIT 1) Address,
ARRAY_AGG(CreatedDate IGNORE NULLS LIMIT 1) CreatedDate,
ARRAY_AGG(City IGNORE NULLS LIMIT 1) City
from
`your_table`
GROUP BY
ID
You can test with simulated data:
#standardSQL
WITH data AS(
SELECT 1 ID, '1 1st Street1' Address, NULL CreatedDate, 'New York City' City UNION ALL
SELECT 1, NULL, '8/18/17', NULL UNION ALL
SELECT 2, NULL, '8/13/17', 'Boston' UNION ALL
SELECT 2, '2 2nd Street', NULL, NULL UNION ALL
SELECT 3, '3 3rd Street', '8/1/17', 'Los Angeles' UNION ALL
SELECT 3, NULL, NULL, NULL UNION ALL
SELECT 3, NULL, '8/7/17', NULL
)
SELECT
ID,
ARRAY_AGG(Address IGNORE NULLS LIMIT 1) Address,
ARRAY_AGG(CreatedDate IGNORE NULLS LIMIT 1) CreatedDate,
ARRAY_AGG(City IGNORE NULLS LIMIT 1) City
from
data
GROUP BY
ID
ORDER BY
ID
You may find that queries once exceeding resources will be processed in seconds just by using ARRAY_AGG
properly.
Upvotes: 0
Reputation: 35323
aggregation is the key here. assuming you want the max/min value for each group if multiple exist. looking at the date column on ID 3 it's min.
SELECT ID
, min(address) as Address
, min(createdDate) as createdDate
, min(city) as City
FROM table
GROUP BY ID
Upvotes: 2
Reputation: 204766
Group by the column you want to be unique and use max()
to get the results containing something for each id
select ID, max(Address), max(CreatedDate), max(City)
from your_table
group by ID
Upvotes: 3