Reputation: 11
I have a table with data of customer with interaction to touchpoint . I need to calculate the time-difference of customer & identify the customer as repeat if the customer is touching the point with in 180 seconds , but if the customer come again after 180 seconds of last interaction it will be termed as Unique & the counter will start again ..
Input Table
UID | CREATION_DATE |
---|---|
Cust 1 | 13-06-2023 15:12 |
Cust 1 | 13-06-2023 18:43 |
Cust 1 | 13-06-2023 18:43 |
Cust 2 | 08-06-2023 22:25 |
Cust 2 | 10-06-2023 14:32 |
Cust 3 | 12-06-2023 10:23 |
Cust 4 | 13-06-2023 18:57 |
Cust 4 | 13-06-2023 18:58 |
Cust 4 | 13-06-2023 18:58 |
I have calculate the the time difference using timestamp & lag function using
SELECT unique_cust_id as UID ,creation_date,
ROW_NUMBER() OVER ( PARTITION BY unique_cust_id ORDER BY subs_msisdn ,creation_date aSC ) row_num,
timestampdiff ( 2,creation_date - lag (creation_date) OVER (PARTITION BY unique_cust_id ORDER BY creation_date ))
AS diff
FROM DB2PROJ.TESTING_REPEAT
with output
UID | CREATION_DATE | ROW_NUM | DIFF_time |
---|---|---|---|
Cust 1 | 13-06-2023 15:12 | 1 | |
Cust 1 | 13-06-2023 18:43 | 2 | 12665 |
Cust 1 | 13-06-2023 18:43 | 3 | 14 |
Cust 2 | 08-06-2023 22:25 | 1 | |
Cust 2 | 10-06-2023 14:32 | 2 | 144432 |
Cust 3 | 12-06-2023 10:23 | 1 | |
Cust 4 | 13-06-2023 18:57 | 1 | |
Cust 4 | 13-06-2023 18:58 | 2 | 34 |
Cust 4 | 13-06-2023 18:58 | 3 | 43 |
Desired Output
UID | CREATION_DATE | ROW_NUM | DIFF_time | Cum_Diff | Category |
---|---|---|---|---|---|
Cust 1 | 13-06-2023 15:12 | 1 | 0 | Unique | |
Cust 1 | 13-06-2023 18:43 | 2 | 12665 | 0 | Unique |
Cust 1 | 13-06-2023 18:43 | 3 | 14 | 14 | Repeat |
Cust 2 | 08-06-2023 22:25 | 1 | 0 | Unique | |
Cust 2 | 10-06-2023 14:32 | 2 | 144432 | 0 | Unique |
Cust 3 | 12-06-2023 10:23 | 1 | 0 | Unique | |
Cust 4 | 13-06-2023 18:57 | 1 | 0 | Unique | |
Cust 4 | 13-06-2023 18:58 | 2 | 34 | 34 | Repeat |
Cust 4 | 13-06-2023 18:58 | 3 | 43 | 77 | Repeat |
Above cum_diff is calculated in excel where if row_num is 1 . termed as 0 & Unique in category , cumulative addition is done of diff_time & if its >180 then it termed as 0 in cum_diff & Unique in category & if its with in <=180 then it added & termed as repeat in category
Please help to build in excel
Upvotes: 0
Views: 82
Reputation: 9181
For the category, you just need a case expression to check if the time difference between the current time and the previous time is greater than 180. For your cumulative sum of time differences, you have a gaps and islands problem, one approach to solve it is the use of a running sum that increased by 1 whenever the time difference is exceeding 180.
WITH t AS
(
SELECT unique_cust_id as UID ,creation_date,
ROW_NUMBER() OVER (PARTITION BY unique_cust_id ORDER BY creation_date ) row_num,
timestampdiff(second,
lag(creation_date, 1, creation_date) OVER (PARTITION BY unique_cust_id ORDER BY creation_date), creation_date) AS DIFF_time
FROM TESTING_REPEAT
),
grps AS
(
SELECT *,
CASE WHEN DIFF_time > 180 OR row_num = 1 THEN 'Unique' ELSE 'Repeat' END AS Category,
SUM(CASE WHEN DIFF_time > 180 THEN 1 ELSE 0 END) OVER (PARTITION BY UID ORDER BY creation_date) grp
FROM t
)
SELECT UID, creation_date, row_num, DIFF_time,
SUM(CASE WHEN DIFF_time <= 180 THEN DIFF_time ELSE 0 END) OVER (PARTITION BY UID, grp ORDER BY creation_date) AS Cum_Diff,
Category, grp
FROM grps
demo on MySQL
Upvotes: 0