Reputation: 13
I am trying to write a query that defines and then describes 'subscription activity' from a user transaction table. Let's say the transaction table we're working from has customer_id, created_at (transaction date), merchant_name, transaction_amount.
The final output table should have customer_Id as the first column, activity_month as the second column, merchant_name_x as the third column and _merchant_name_y as the 4th column (can have infinite columns from then on for various merchants). And here's the tricky part: the table should be populated with '1' if that cell represents a string of at least 3 consecutive months of that user transacting at least once with that merchant, and '0' otherwise.
Here is basic code that got me what would look like the output table with '1' for activity and '0' for no activity. It needs to be amended so that '1' appears only if its part of an unbroken string of >= 3 months of consecutive activity for the given merchant and '0' otherwise.
SELECT customer_id
, LAST_DAY(created_at::DATE) AS month
, MAX(CASE WHEN merchant_name = 'Amazon Prime' THEN 1 ELSE 0 END) AS amazon
, MAX(CASE WHEN merchant_name = 'Netflix.com' THEN 1 ELSE 0 END) AS netflix
FROM TABLE
GROUP BY 1,2
The output would look something like this: '1' denotes that the user transacted with that merchant at least once (doesn't matter if once, twice, or 300 times that month) and that it is part of >= 3 consecutive months of that user transacting with that merchant
CUSTOMER_ID MONTH AMAZON NETFLIX
54321 2019-04-30 1 0
54321 2019-03-31 1 0
54321 2019-02-29 1 1
54321 2019-01-31 1 1
54321 2018-12-31 0 1
54321 2018-11-30 0 0
Working from a TABLE that lists and describes transactions:
ColumnsData Type
TRANSACTION_ID NUMBER(38,0)
CREATED_AT TIMESTAMP_NTZ(9)
AMOUNT_DOLLARS NUMBER(38,0)
CUSTOMER_ID NUMBER(38,0)
MERCHANT_NAME VARCHAR(16777216)
(a sample of the data table ):
TRANSACTION_ID CREATED_AT AMOUNT_DOLLARS CUSTOMER_ID MERCHANT_NAME
1234567 2018-08-23 57.31 306797979 Amazon Prime
7654321 2020-09-21 10.99 309221214 Stp & Shop
9999971 2020-01-07 11.59 509227711 Lyft Com
6549875 2019-05-10 88.23 311188226 Lttle Caesar
3121541 2020-07-31 72.01 307746845 Redbox
1279875 2020-04-05 15.20 315151515 Family Dollar
(timestamp left out from CREATED_AT for compactness) Query must scale across thousands of customers each making many transactions. Many thanks.
Upvotes: 1
Views: 92
Reputation: 3034
I think if I understand correctly you will need to use a windowed function like below:
UPDATE: updated to reflect sample data. I took the data you provided and added 3 additional records for Customer_ID = 306797979 and Merchant_name = Amazon. What you will see in the results is that the 2018-08 and 09 records are 0 because there are not 3 consecutive months of an amazon subscription at that point. When we hit 2018-10 and 11 they both now have 3 subsequent periods of transactions.
create temporary table temp (
TRANSACTION_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_NTZ(9),
AMOUNT_DOLLARS NUMBER(38,0),
CUSTOMER_ID NUMBER(38,0),
MERCHANT_NAME VARCHAR(16777216)
);
insert into temp
values
(1234567, '2018-08-23 00:00:00', 57.31, 306797979, 'Amazon Prime'),
(7654321, '2020-09-21 00:00:00', 10.99, 309221214, 'Stp & Shop'),
(9999971, '2020-01-07 00:00:00', 11.59, 509227711, 'Lyft Com'),
(6549875, '2019-05-10 00:00:00', 88.23, 311188226, 'Lttle Caesar'),
(3121541, '2020-07-31 00:00:00', 72.01, 307746845, 'Redbox'),
(1279875, '2020-04-05 00:00:00', 15.20, 315151515, 'Family Dollar'),
(1234567, '2018-11-23 00:00:00', 57.31, 306797979, 'Amazon Prime'),
(1234236, '2018-09-23 00:00:00', 57.31, 306797979, 'Amazon Prime'),
(3972831, '2018-10-23 00:00:00', 57.31, 306797979, 'Amazon Prime');
with _filler_dates as (
select
date_trunc('MONTH',dateadd(
month,
'-' || row_number() over (order by null),
dateadd(day, 1 , current_date()))) as filler_date
from table (generator(rowcount => 30))
),
_data as(
select customer_id
, merchant_name
, amount_dollars
, date_trunc('MONTH', CREATED_AT) as cur_month
, COALESCE(LEAD(cur_month,1,NULL) OVER (
PARTITION BY customer_id
ORDER BY cur_month)
, date_trunc('MONTH', current_date())) as next_month
from temp)
,_merged as (
select d.customer_id
, d.merchant_name
, CASE WHEN fd.filler_date <> cur_month then NULL else d.amount_dollars end as amount_dollars
, fd.filler_date
from _data d
join _filler_dates fd
on fd.filler_date between d.cur_month and dateadd(MONTH, -1, d.next_month)
)
select *
, CASE WHEN sum(CASE WHEN amount_dollars > 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY customer_id, merchant_name
ORDER BY filler_date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) >= 3
THEN 1 ELSE 0 END as amazon
from _merged
order by 1,2,4;
//drop table temp;
Upvotes: 1