Reputation: 2810
I am trying to write a query in sql where I need to find the max no. of consecutive months over a period of last 12 months excluding June and July.
so for example I have an initial table as follows
+---------+--------------+-----------+------------+
| id | Payment | amount | Date |
+---------+--------------+-----------+------------+
| 1 | CJ1 | 70000 | 11/3/2020 |
| 1 | 1B4 | 36314000 | 12/1/2020 |
| 1 | I21 | 119439000 | 1/12/2021 |
| 1 | 0QO | 9362100 | 2/2/2021 |
| 1 | 1G0 | 140431000 | 2/23/2021 |
| 1 | 1G | 9362100 | 3/2/2021 |
| 1 | g5d | 9362100 | 4/6/2021 |
| 1 | rt5s | 13182500 | 4/13/2021 |
| 1 | fgs5 | 48598 | 5/18/2021 |
| 1 | sd8 | 42155 | 5/25/2021 |
| 1 | wqe8 | 47822355 | 7/20/2021 |
| 1 | cbg8 | 4589721 | 7/27/2021 |
| 1 | jlk8 | 4589721 | 8/3/2021 |
| 1 | cxn9 | 4589721 | 10/5/2021 |
| 1 | qwe | 45897210 | 11/9/2021 |
| 1 | mmm | 45897210 | 12/16/2021 |
+---------+--------------+-----------+------------+
I have written below query:
SELECT customer_number, year, month,
payment_month - lag(payment_month) OVER(partition by customer_number ORDER BY year, month) as previous_month_indicator,
FROM
(
SELECT DISTINCT Month(date) as month, Year(date) as year, CUSTOMER_NUMBER
FROM Table1
WHERE Month(date) not in (6,7)
and TO_DATE(date,'yyyy-MM-dd') >= DATE_SUB('2021-12-31', 425)
and customer_number = 1
) As C
and I get this output
+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
| 1 | 2020 | 11 | null |
| 1 | 2020 | 12 | 1 |
| 1 | 2021 | 1 | -11 |
| 1 | 2021 | 2 | 1 |
| 1 | 2021 | 3 | 1 |
| 1 | 2021 | 4 | 1 |
| 1 | 2021 | 5 | 1 |
| 1 | 2021 | 8 | 3 |
| 1 | 2021 | 10 | 2 |
| 1 | 2021 | 11 | 1 |
+-----------------+------+-------+--------------------------+
What I want is to get a view like this Expected output
+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
| 1 | 2020 | 11 | 1 |
| 1 | 2020 | 12 | 1 |
| 1 | 2021 | 1 | 1 |
| 1 | 2021 | 2 | 1 |
| 1 | 2021 | 3 | 1 |
| 1 | 2021 | 4 | 1 |
| 1 | 2021 | 5 | 1 |
| 1 | 2021 | 8 | 1 |
| 1 | 2021 | 9 | 0 |
| 1 | 2021 | 10 | 1 |
| 1 | 2021 | 11 | 1 |
+-----------------+------+-------+--------------------------+
As June/July does not matter, after May, August should be considered as consecutive month, and since in September there was no record it appears as 0 and breaks the consecutive months chain.
My final desired output is to get the max no of consecutive months in which transactions were made which in above case is 8 from Nov-2020 to Aug-2021
Final Desired Output:
+-----------------+-------------------------+
| customer_number | Max_consecutive_months |
+-----------------+-------------------------+
| 1 | 8 |
+-----------------+-------------------------+
Upvotes: 6
Views: 831
Reputation: 360
CTEs can break this down a little easier. In the code below, the payment_streak
CTE is the key bit; the start_of_streak
field is first marking rows that count as the start of a streak, and then taking the maximum over all previous rows (to find the start of this streak).
The last SELECT
is only comparing these two dates, computing how many months are between them (excluding June/July), and then finding the best streak per customer.
WITH payments_in_context AS (
SELECT customer_number,
date,
lag(date) OVER (PARTITION BY customer_number ORDER BY date) AS prev_date
FROM Table1
WHERE EXTRACT(month FROM date) NOT IN (6,7)
),
payment_streak AS (
SELECT
customer_number,
date,
max(
CASE WHEN (prev_date IS NULL)
OR (EXTRACT(month FROM date) <> 8
AND (date - prev_date >= 62
OR MOD(12 + EXTRACT(month FROM date) - EXTRACT(month FROM prev_date),12)) > 1))
OR (EXTRACT(month FROM date) = 8
AND (date - prev_date >= 123
OR EXTRACT(month FROM prev_date) NOT IN (5,8)))
THEN date END
) OVER (PARTITION BY customer_number ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
as start_of_streak
FROM payments_in_context
)
SELECT customer_number,
max( 1 +
10*(EXTRACT(year FROM date) - EXTRACT(year FROM start_of_streak))
+ (EXTRACT(month FROM date) - EXTRACT(month FROM start_of_streak))
+ CASE WHEN (EXTRACT(month FROM date) > 7 AND EXTRACT(month FROM start_of_streak) < 6)
THEN -2
WHEN (EXTRACT(month FROM date) < 6 AND EXTRACT(month FROM start_of_streak) > 7)
THEN 2
ELSE 0 END
) AS max_consecutive_months
FROM payment_streak
GROUP BY 1;
Upvotes: 1
Reputation: 71451
You can use a recursive cte
to generate all the dates in the twelve month timespan for each customer id
, and then find the maximum number of consecutive dates excluding June and July in that interval:
with recursive cte(id, m, c) as (
select cust_id, min(date), 1 from payments group by cust_id
union all
select c.id, c.m + interval 1 month, c.c+1 from cte c where c.c <= 12
),
dts(id, m, f) as (
select c.id, c.m, c.c = 1 or exists
(select 1 from payments p where p.cust_id = c.id and extract(month from p.date) = extract(month from (c.m - interval 1 month)) and extract(year from p.date) = extract(year from (c.m - interval 1 month)))
from cte c where extract(month from c.m) not in (6,7)
),
result(id, f, c) as (
select d.id, d.f, (select sum(d.id = d1.id and d1.m < d.m and d1.f = 0)+1 from dts d1)
from dts d where d.f != 0
)
select r1.id, max(r1.s)-1 from (select r.id, r.c, sum(r.f) s from result r group by r.id, r.c) r1 group by r1.id
Upvotes: 1