Alex
Alex

Reputation: 179

How to select specific rows in a "group by" groups using conditions on multiple columns?

I have the following table with many userId (in the example only one userId for demo purpose):enter image description here

For every userId I want to extract two rows:

  1. The first row should be isTransaction = 0 and the earliest date!
  2. The second row should be isTransaction = 1, device should be different from that of the first row, isTransaction should be equal to 1 and the earliest date right after that of the first row

That is, the output should be:

Time         userId      device    isTransaction
2021-01-27   10187675    mobile    0
2021-01-30   10187675    web       1

I tried to rank rows with partitioning and ordering but it didn't work:

Select * from 
  (SELECT *, rank() over(partition by userId, device, isTransaction order by isTransaction, Time) as rnk  
  FROM table 1)
where rnk=1
order by Time

Please help! It would be also good to check the time difference between these two rows to not exceed 30 days. Otherwise, userId should be dropped.

Upvotes: 0

Views: 164

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You can first identify the earliest time for 0. Then enumerate the rows and take only the first one:

select t.*
from (select t.*,
             row_number() over (partition by userid, status order by time) as seqnum
      from (select t.*,
                   min(case when isTransaction = 0 then time end) over (partition by userid order by time) as time_0
            from t
           ) t
      where time > time_0
     ) t
where seqnum = 1;

This satisfies the two conditions you enumerated.

Then buried in the text, you want to eliminate rows where the difference is greater than 30 days. That is a little tricker . . . but not too hard:

select t.*
from (select t.*,
             min(case when isTransaction = 1 then time end) over (partition by userid) as time_1
             row_number() over (partition by userid, status order by time) as seqnum
      from (select t.*,
                   min(case when isTransaction = 0 then time end) over (partition by userid order by time) as time_0
            from t
           ) t
      where time > time_0
     ) t
where seqnum = 1 and
      time_1 < timestamp_add(time_0, interval 30 day);

Upvotes: 1

Related Questions