Reputation: 49
If I have a table like the following:
|**Entry**-------**Name**--------**Date**------------------**Boolean**|
| 1.-----------Car-----------2019-12-01----------True|
| 2.-----------Car-----------2019-12-03----------False|
| 3.-----------Bus-----------2019-12-05----------False|
| 4.-----------Bus-----------2019-12-11----------False|
I want to select a row with distinct Name if Boolean column is true, or if the boolean values are false then select the row with earliest date. So for the car entry I want to select the first row since the boolean for it is true (so I can ignore all other entries) and for the bus entry I want to select the last row since both the booleans for it are false so I want to pick the earliest date (closest to present date).
Edit: Expected output
| 1.-----------Car-----------2019-12-01----------True|
| 4.-----------Bus-----------2019-12-11----------False|
Upvotes: 1
Views: 2511
Reputation: 172974
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY bool DESC, dt DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY name
Note; this is based on "There can only be one row with car and true" from your comment ...
if to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Car' name, '2019-12-01' dt, TRUE bool UNION ALL
SELECT 'Car', '2019-12-03', FALSE UNION ALL
SELECT 'Bus', '2019-12-05', FALSE UNION ALL
SELECT 'Bus', '2019-12-11', FALSE
)
SELECT AS VALUE ARRAY_AGG(t ORDER BY bool DESC, dt DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY name
result is
Row name dt bool
1 Car 2019-12-01 true
2 Bus 2019-12-11 false
Upvotes: 2