Reputation: 3
I have a table like this:
I want to transform the table into this:
To get the table 2, I group by value in column 1, then pick the values in column 2 where duration is highest value. Can anyone help me with the syntax?
Upvotes: 0
Views: 31
Reputation: 172994
Below is for Bigquery Standard SQL
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY duration DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY column1
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A' column1, 100 column2, 3 duration UNION ALL
SELECT 'A', 101, 7 UNION ALL
SELECT 'A', 102, 5 UNION ALL
SELECT 'B', 108, 5 UNION ALL
SELECT 'B', 109, 9
)
SELECT AS VALUE ARRAY_AGG(t ORDER BY duration DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY column1
with result
Row column1 column2 duration
1 A 101 7
2 B 109 9
Upvotes: 0
Reputation: 3369
One solution is to add a ROW_NUMBER()
to your data which is ordered on the "Duration" field, then filter on that row number so you are only selecting the highest values from the partition.
Here is an example with a dummy data based on your screenshot:
WITH table AS (
SELECT 1 column1, 100 column2, 3 Duration UNION ALL
SELECT 1, 101, 7 UNION ALL
SELECT 1, 102, 5 UNION ALL
SELECT 2, 108, 5 UNION ALL
SELECT 2, 109, 9
)
SELECT * EXCEPT (rn) FROM(
SELECT column1,
column2,
Duration,
ROW_NUMBER() OVER(PARTITION BY column1 ORDER BY Duration DESC) AS rn
FROM table
)
WHERE rn = 1
Which returns the result:
Upvotes: 1