Reputation: 1
I have a customer dataset with their start date and enddate. Each customer can have different start date and end date. I want to join it to another table to get a value i.e. status at start date and end date for all customers and populate the first table with status at start date and status at end date.
Can you please help
Not able to find efficient way to do this:
Customers Start Date Enddate
A 01/01/2024 30/06/2024
B 01/01/2024 30/05/2024
C 05/05/2023 01/01/2024
D 28/02/2023 01/01/2024
E 07/06/2023 20/02/2024
Table 2:
Customers Date Status
A 01/01/2024 Active
B 01/01/2024 Active
C 01/01/2024 Inactive
D 01/01/2024 Inactive
E 01/01/2024 Active
A 05/05/2023 Active
B 05/05/2023 Active
C 05/05/2023 Inactive
D 05/05/2023 Inactive
E 05/05/2023 Active
A 28/02/2023 Active
B 28/02/2023 Active
C 28/02/2023 Active
D 28/02/2023 Active
E 28/02/2023 Active
This table has records for all customers and dates with status
Expected result:
Customers Start Date End date Status at start date Status at end date
A 01/01/2024 30/06/2024 Active Active
B 01/01/2024 30/05/2024 Active Active
C 05/05/2023 01/01/2024 Inactive Active
D 28/02/2023 01/01/2024 Active Inactive
E 07/06/2023 20/02/2024 Active Active
Upvotes: 0
Views: 75
Reputation: 88
Get the start dates and end dates separately and then join sub-tables together on the customer id:
SELECT t1.customer
, t1.start_date
, t2.end_date
, t1.status_at_start_date
, t2.status_at_end_date
FROM (
SELECT cs.customer
, c.start_date
, status as status_at_start_date
FROM customerstatus cs
JOIN customers c
ON cs.customer = c.customer
QUALIFY ROW_NUMBER() OVER(
PARTITION BY cs.customer
ORDER BY
CASE WHEN c.start_date = cs.date -- if match then prefer that
THEN '0000-01-01'
ELSE c.start_date -- if no match, just use the earliest date
END) = 1
) t1
JOIN (
SELECT cs.customer
, c.end_date
, status as status_at_end_date
FROM customerstatus cs
JOIN customers c
ON cs.customer = c.customer
QUALIFY ROW_NUMBER() OVER(
PARTITION BY cs.customer
ORDER BY
CASE WHEN c.end_date = cs.date
THEN '9999-12-31'
ELSE c.end_date
END DESC) = 1
) t2
ON t1.customer = t2.customer
ORDER BY t1.customer;
Output:
customer | start_date | end_date | status_at_start_date | status_at_end_date |
---|---|---|---|---|
A | 2024-01-01 | 2024-06-30 | Active | Active |
B | 2024-01-01 | 2024-05-30 | Active | Active |
C | 2023-05-05 | 2024-01-01 | Inactive | Inactive |
D | 2023-02-28 | 2024-01-01 | Active | Inactive |
E | 2023-06-07 | 2024-02-20 | Active | Active |
Upvotes: 0
Reputation: 5633
(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.StartDate
order by t2.Date desc limit 1) status_at_Startdate
(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.EndDate
order by t2.Date desc limit 1) status_at_Enddate
See example
select *
,(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.StartDate
order by t2.Date desc limit 1) status_at_Startdate
,(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.EndDate
order by t2.Date desc limit 1) status_at_Enddate
from Table1 t
customers | startdate | enddate | status_at_startdate | status_at_enddate |
---|---|---|---|---|
A | 2024-01-01 | 2024-06-30 | Active | Active |
B | 2024-01-01 | 2024-05-30 | Active | Active |
C | 2023-05-05 | 2024-01-01 | Inactive | Inactive |
D | 2023-02-28 | 2024-01-01 | Active | Inactive |
E | 2023-06-07 | 2024-02-20 | Active | Active |
Upvotes: 1
Reputation: 3010
Table data in markdown text format is more readable.
-First CTE start_status just finds the status at start_date
Sample Query
-- find the status for start date
WITH start_status AS (
SELECT
c.customer_id,
c.start_date,
c.end_date,
t.status AS status_at_start_date
FROM
customers c
LEFT JOIN
table2 t ON c.customer_id = t.customer_id AND t.date = c.start_date
),
-- find the closest status for the end date
end_status AS (
SELECT
customer_id,
end_date,
status AS status_at_end_date
FROM (
SELECT
c.customer_id,
c.end_date,
t.status,
ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY t.date DESC) AS rn
FROM
customers c
LEFT JOIN
table2 t ON c.customer_id = t.customer_id AND t.date <= c.end_date
) subquery
WHERE rn = 1
)
SELECT
ss.customer_id AS Customers,
ss.start_date AS Start_Date,
ss.end_date AS End_Date,
COALESCE(ss.status_at_start_date, es.status_at_end_date) AS Status_at_start_date,
es.status_at_end_date AS Status_at_end_date
FROM
start_status ss
JOIN
end_status es ON ss.customer_id = es.customer_id;
Output
CUSTOMERS | START_DATE | END_DATE | STATUS_AT_START_DATE | STATUS_AT_END_DATE |
---|---|---|---|---|
A | 2024-01-01 | 2024-06-30 | Active | Active |
B | 2024-01-01 | 2024-05-30 | Active | Active |
C | 2023-05-05 | 2024-01-01 | Inactive | Inactive |
D | 2023-02-28 | 2024-01-01 | Active | Inactive |
E | 2023-06-07 | 2024-02-20 | Active | Active |
Upvotes: 1