Reputation: 37
I have some data that looks like this, I am trying to find the minimum corresponding next_start_date
with each unique cancel_date
.
group_id | individual_id | race | color_of_eye | cancel_date | next_start_date |
---|---|---|---|---|---|
1 | 001 | A | @@ | 2017-11-06 | 2018-04-16 |
2 | 002 | B | $$ | 2017-06-26 | 2017-06-26 |
2 | 002 | B | $$ | 2017-06-26 | 2018-01-23 |
2 | 002 | B | $$ | 2017-06-26 | 2019-02-16 |
3 | 002 | C | << | 2017-12-28 | 2018-01-23 |
3 | 002 | C | << | 2017-12-28 | 2019-02-16 |
4 | 003 | B | ** | 2010-01-26 | 2011-01-23 |
4 | 003 | B | ** | 2010-01-26 | 2019-02-16 |
5 | 004 | K | @@ | 2017-12-28 | 2018-01-23 |
Desired output:
group_id | individual_id | race | color_of_eye | cancel_date | next_start_date |
---|---|---|---|---|---|
1 | 001 | A | @@ | 2017-11-06 | 2018-04-16 |
2 | 002 | B | $$ | 2017-06-26 | 2017-06-26 |
3 | 002 | C | << | 2017-12-28 | 2018-01-23 |
4 | 003 | B | ** | 2010-01-26 | 2011-01-23 |
5 | 004 | K | @@ | 2017-12-28 | 2018-01-23 |
I tried using a window function with
SELECT
*,
MIN(next_start_date) OVER (PARTITION BY individual_id, cancel_date) AS minimum_next_start_date
FROM table
ORDER BY
individual_id, cancel_date, minimum_next_start_date
The results looks like this
group_id | individual_id | race | color_of_eye | cancel_date | minimum_next_start_date |
---|---|---|---|---|---|
1 | 001 | A | @@ | 2017-11-06 | 2018-04-16 |
2 | 002 | B | $$ | 2017-06-26 | 2017-06-26 |
2 | 002 | B | $$ | 2017-06-26 | 2017-06-26 |
2 | 002 | B | $$ | 2017-06-26 | 2017-06-26 |
3 | 002 | C | << | 2017-12-28 | 2018-01-23 |
3 | 002 | C | << | 2017-12-28 | 2018-01-23 |
4 | 003 | B | ** | 2010-01-26 | 2011-01-23 |
4 | 003 | B | ** | 2010-01-26 | 2011-01-23 |
5 | 004 | K | @@ | 2017-12-28 | 2018-01-23 |
Should I do a Min
for minimum_next_start_date
, and group by
every other column there is? Please help.
Upvotes: 0
Views: 964
Reputation: 1269583
From what I can tell, you want aggregation:
select group_id, individual_id, race, color_of_eye, cancel_date,
min(next_start_date)
from t
group by group_id, individual_id, race, color_of_eye, cancel_date;
In Postgres, distinct on
often has better performance. If the group by
defines the groups without the other columns, then you can use:
select distinct on (group_id, individual_id) t.*
from t
order by group_id, individual_id, next_start_date;
Upvotes: 1