metersk
metersk

Reputation: 12509

Finding groups of rows with consecutive identical values that have a defined end (SQL Redshift)

I have a table that has the subscription state of users on any given day. The data looks like this

+------------+------------+--------------+
| account_id |    date    | current_plan |
+------------+------------+--------------+
| 1          | 2019-08-01 | free         |
| 1          | 2019-08-02 | free         |
| 1          | 2019-08-03 | yearly       |
| 1          | 2019-08-04 | yearly       |
| 1          | 2019-08-05 | yearly       |
| ...        |            |              |
| 1          | 2020-08-02 | yearly       |
| 1          | 2020-08-03 | free         |
| 2          | 2019-08-01 | monthly      |
| 2          | 2019-08-02 | monthly      |
| ...        |            |              |
| 2          | 2019-08-31 | monthly      |
| 2          | 2019-09-01 | free         |
| ...        |            |              |
| 2          | 2019-11-26 | free         |
| 2          | 2019-11-27 | monthly      |
| ...        |            |              |
| 2          | 2019-12-27 | monthly      |
| 2          | 2019-12-28 | free         |
| 3          | 2020-05-31 | monthly      |
| 3          | 2020-06-01 | monthly      |
| 4          | 2019-08-01 | yearly       |
| ...        |            |              |
| 4          | 2020-06-01 | yearly       |
+------------+------------+--------------+

I would like to have a table that gives the start and end dates of a subscription. It would look something like this. Please note importantly, that account_ids 3 and 4 are not included in this table because they are still in a subscription as of today (6/1/2020). I only would like a summary of people who have moved out of a subscription.

+------------+------------+------------+-------------------+
| account_id | start_date |  end_date  | subscription_type |
+------------+------------+------------+-------------------+
|          1 | 2019-08-03 | 2020-08-02 | yearly            |
|          2 | 2019-08-01 | 2019-08-31 | monthly           |
|          2 | 2019-11-27 | 2019-12-27 | monthly           |
+------------+------------+------------+-------------------+

Currently I have the following which is very close, but still gives me users that have not fallen out of a subscription

select account_id, current_plan, min(date), max(date)
from (select d.*,
             row_number() over (partition by account_id order by date) as seqnum,
             row_number() over (partition by account_id, current_plan order by date) as seqnum_2
      from data d
     ) d
where current_plan not in ('free', 'trial')
group by account_id, current_plan, (seqnum - seqnum_2);

Upvotes: 0

Views: 126

Answers (1)

AlexYes
AlexYes

Reputation: 4208

if you want to do a very simple filter for users that have fallen out as of today you can simply add:

having max(date)<current_date

to your query but this will also include previous fallouts like the first one for user_id=2

but if you want forward looking (like for user id=1) and filter out only the last fallout you need to have a better "gaps and islands" query with lag function, if you check for more "gaps and islands" solutions you will find it... generally, lag(currrent_plan) over (partition by id order by date) gives you the previous day's plan for every day, this way you can identify fallout dates and then rank them over the same window to get the last one for every ID

Upvotes: 1

Related Questions