Finding groups of rows with consecutive identical values that have a defined end (SQL Redshift)

Question

I have a table that has the subscription state of users on any given day. The data looks like this

+------------+------------+--------------+
| account_id |    date    | current_plan |
+------------+------------+--------------+
| 1          | 2019-08-01 | free         |
| 1          | 2019-08-02 | free         |
| 1          | 2019-08-03 | yearly       |
| 1          | 2019-08-04 | yearly       |
| 1          | 2019-08-05 | yearly       |
| ...        |            |              |
| 1          | 2020-08-02 | yearly       |
| 1          | 2020-08-03 | free         |
| 2          | 2019-08-01 | monthly      |
| 2          | 2019-08-02 | monthly      |
| ...        |            |              |
| 2          | 2019-08-31 | monthly      |
| 2          | 2019-09-01 | free         |
| ...        |            |              |
| 2          | 2019-11-26 | free         |
| 2          | 2019-11-27 | monthly      |
| ...        |            |              |
| 2          | 2019-12-27 | monthly      |
| 2          | 2019-12-28 | free         |
| 3          | 2020-05-31 | monthly      |
| 3          | 2020-06-01 | monthly      |
| 4          | 2019-08-01 | yearly       |
| ...        |            |              |
| 4          | 2020-06-01 | yearly       |
+------------+------------+--------------+

I would like to have a table that gives the start and end dates of a subscription. It would look something like this. Please note importantly, that account_ids 3 and 4 are not included in this table because they are still in a subscription as of today (6/1/2020). I only would like a summary of people who have moved out of a subscription.

+------------+------------+------------+-------------------+
| account_id | start_date |  end_date  | subscription_type |
+------------+------------+------------+-------------------+
|          1 | 2019-08-03 | 2020-08-02 | yearly            |
|          2 | 2019-08-01 | 2019-08-31 | monthly           |
|          2 | 2019-11-27 | 2019-12-27 | monthly           |
+------------+------------+------------+-------------------+

Currently I have the following which is very close, but still gives me users that have not fallen out of a subscription

select account_id, current_plan, min(date), max(date)
from (select d.*,
             row_number() over (partition by account_id order by date) as seqnum,
             row_number() over (partition by account_id, current_plan order by date) as seqnum_2
      from data d
     ) d
where current_plan not in ('free', 'trial')
group by account_id, current_plan, (seqnum - seqnum_2);

Finding groups of rows with consecutive identical values that have a defined end (SQL Redshift)

Answers (1)

Related Questions