dummyds
dummyds

Reputation: 197

How can I get the minimum date based on a condition in Resdhift?

Let's say you have the following dataset:

id    date_col       boolean_col
1     2020-01-01     0
1     2020-01-05     1
1     2020-02-01     0
1     2020-03-01     1
2     2020-01-01     0
2     2020-05-01     0
3     2020-01-01     0
3     2020-03-05     1

My final output should be grouped, one row per id. The way I want to group is: if the boolean column is true, I want to bring the minimum - or maximum, I would like to test both , if possible - date for the id. If all boolean columns for the id are false, then I wanted to get the highest date. The desired output would be something like this:

id    date_col       boolean_col
1     2020-01-05     1
2     2020-05-01     0
3     2020-03-05     1

Any ideas on how to get this? I'm really struggling to find a way

Upvotes: 0

Views: 247

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269633

One method is row_number():

select t.*
from (select t.*,
             row_number() over (partition by id order by boolean_col desc, date desc) as seqnum
      from t
     ) t
where seqnum = 1;

There are two other fun methods. One is aggregation with some cleverness:

select id,
       coalesce(max(case when boolean_col = 1 then date end),
                max(date)
               ) as date,
       max(boolean_col)
from t
group by id;

The other treats this as a prioritization and uses union all:

select id, max(date), boolean_col
from t
where boolean_col = 1
group by id
union all
select id, max(date), max(boolean_col)
from t
group by id
having max(boolean_col) = 0;

Upvotes: 1

Related Questions