Reputation: 551
I am trying to group rows that have the save value across two columns and have the result ranked/sorted based on a third column.
The result should contain all the other columns.
For the table:
with sample as (
select 'A' as c1, 'B' as c2, '22:00' as c3, 'Da' as c4
union all
select 'A' as c1, 'B' as c2, '23:00' as c3, 'Db' as c4
union all
select 'A' as c1, 'B' as c2, '09:00' as c3, 'Dc' as c4
union all
select 'A' as c1, 'C' as c2, '22:00' as c3, 'Dd' as c4
union all
select 'B' as c1, 'C' as c2, '09:00' as c3, 'De' as c4
)
Grouping or filtering by column c1 and c2 ranked by time on c3, the output would be:
row_number() over (partition by c1, c2 order by c3) as rnk
| c1, c2, c3, c4, rnk|
-----------------------
| A | B |09:00| Dc| 1 |
| A | B |22:00| Da| 2 |
| A | B |23:00| Db| 3 |
| A | C |22:00| Dd| 1 |
| B | C |09:00| De| 1 |
All the other columns like c4, c5.. should be kept but don't have any effect on the group criteria or rank.
A believe a window function with partition on c1 and c2 and order by c3 could work, but not sure if it's the best way in case of very large tables and the need to group by more columns.
The final output would be a UNIQUE row where rank is 1 (top). The columns should be exactly the same as the sample table (no rank).
Select * from tableX where rnk = 1
would do the work but keep colum 'rnk'.
I want to avoid writing all the columns in the select, excluding the rnk.
| c1, c2, c3, c4 |
-------------------
| A | B |09:00| Dc|
| A | C |22:00| Dd|
| B | C |09:00| De|
*Edited, add final table
Upvotes: 1
Views: 2367
Reputation: 44991
select inline(array(rec))
from (select struct(*) as rec
,row_number() over
(
partition by c1,c2
order by c3
) as rn
from sample t
) t
where rn = 1
;
+------+------+-------+------+
| col1 | col2 | col3 | col4 |
+------+------+-------+------+
| A | B | 09:00 | Dc |
| A | C | 22:00 | Dd |
| B | C | 09:00 | De |
+------+------+-------+------+
P.s. Please note that the columns names were aliased, due to the use of struct
Upvotes: 4
Reputation: 1270873
I think you just want row_number()
:
select t.*,
row_number() over (partition by c1, c2 order by c3) as rnk
from sample t;
The question seems to have changed since I answered it -- a rather rude thing to happen. If you want the top ranked column, then use a subquery:
select t.*
from (select t.*,
row_number() over (partition by c1, c2 order by c3) as rnk
from sample t
) t
where rnk = 1;
This returns one row for each combination of c1/c2 in the data. If you want all rows in the event of ties, then use rank()
instead of row_number()
.
Upvotes: 0