Ashley O
Ashley O

Reputation: 1190

SQL Query Help - Select non null values for each ID (multiple rows)

I have a table in BigQuery where each ID has multiple rows, many with Null values. I want to combine the rows to create a complete set.

Here is a sample set of data:

ID    Address      CreatedDate     City
1    1 1st Street      NULL     New York City
1      NULL          8/18/17      NULL
2      NULL          8/13/17     Boston
2    2 2nd Street    NULL         NULL
3    3 3rd Street    8/1/17      Los Angeles
3     NULL             NULL       NULL
3     NULL           8/7/17       NULL

Here is the expected Output:

ID    Address      CreatedDate     City
1    1 1st Street    8/18/17    New York City
2    2 2nd Street    8/13/17     Boston
3    3 3rd Street    8/1/17      Los Angeles

Honestly I'm not sure this is even possible, but any suggestions would be greatly appreciated!

Upvotes: 2

Views: 3398

Answers (3)

Willian Fuks
Willian Fuks

Reputation: 11787

As another possibility, you could solve this one using the function ARRAY_AGG available in the Standard SQL version of BigQuery (highly recommended that you use standard).

As for the query:

#standardSQL
SELECT
  ID,
  ARRAY_AGG(Address IGNORE NULLS LIMIT 1) Address,
  ARRAY_AGG(CreatedDate IGNORE NULLS LIMIT 1) CreatedDate,
  ARRAY_AGG(City IGNORE NULLS LIMIT 1) City
from
  `your_table`
GROUP BY
  ID

You can test with simulated data:

#standardSQL
WITH data AS(
  SELECT 1 ID, '1 1st Street1' Address, NULL CreatedDate, 'New York City' City UNION ALL
  SELECT 1, NULL, '8/18/17', NULL UNION ALL
  SELECT 2, NULL, '8/13/17', 'Boston' UNION ALL
  SELECT 2, '2 2nd Street', NULL, NULL UNION ALL
  SELECT 3, '3 3rd Street', '8/1/17', 'Los Angeles' UNION ALL
  SELECT 3, NULL, NULL, NULL UNION ALL
  SELECT 3, NULL, '8/7/17', NULL
)

SELECT
  ID,
  ARRAY_AGG(Address IGNORE NULLS LIMIT 1) Address,
  ARRAY_AGG(CreatedDate IGNORE NULLS LIMIT 1) CreatedDate,
  ARRAY_AGG(City IGNORE NULLS LIMIT 1) City
from
  data
GROUP BY
  ID
ORDER BY
  ID

You may find that queries once exceeding resources will be processed in seconds just by using ARRAY_AGG properly.

Upvotes: 0

xQbert
xQbert

Reputation: 35323

aggregation is the key here. assuming you want the max/min value for each group if multiple exist. looking at the date column on ID 3 it's min.

SELECT ID
     , min(address) as Address
     , min(createdDate) as createdDate
     , min(city) as City
FROM table
GROUP BY ID

Upvotes: 2

juergen d
juergen d

Reputation: 204766

Group by the column you want to be unique and use max() to get the results containing something for each id

select ID, max(Address), max(CreatedDate), max(City)
from your_table
group by ID

Upvotes: 3

Related Questions