Reputation: 7
I have been given a task where the data set looks like this (as an example) within SQL Server:
Week | Unit | Customer | Activity |
---|---|---|---|
2024-01-08 | a | 123 | Install |
2024-01-08 | b | 123 | Install |
2024-01-08 | c | 123 | Removal |
2024-01-15 | d | 123 | Removal |
2024-01-15 | e | 123 | Install |
2024-01-22 | f | 123 | Removal |
2024-01-22 | g | 123 | Install |
2024-06-16 | h | 123 | Removal |
2024-06-16 | i | 123 | Removal |
2024-08-01 | j | 123 | Install |
And the goal is to determine whether each install/removal is part of a replacement (meaning that each removal is offset by one install within an 8 week span, or vice versa) or if it is a new install (doesn't have any offsetting removal within 8 weeks and is true "growth") or a true removal (doesn't have an offsetting install within 8 weeks). This example data set is only showing one customer, but I have many customers and a "replacement" within 8 weeks would be an install and removal at the same customer.
I am having a hard time writing into my view how to have each activity "find" its opposite to define itself as a replacement or to determine that it is a new install or true removal because it doesn't have a match within that 8 week span. Has anyone done anything like this or have any ideas on how I could get started?
My desired output would be something like this (not including the explanation column, I just included it to explain):
Week | Unit | Customer | Activity | Type | Explanation |
---|---|---|---|---|---|
2024-01-08 | a | 123 | Install | Replacement Install | Offsets c removal |
2024-01-08 | b | 123 | Install | Replacement Install | Offsets d removal |
2024-01-08 | c | 123 | Removal | Replacement Removal | Offsets a install |
2024-01-15 | d | 123 | Removal | Replacement Removal | Offsets b install |
2024-01-15 | e | 123 | Install | Replacement Install | Offsets f removal |
2024-01-22 | f | 123 | Removal | Replacement Removal | Offsets e install |
2024-01-22 | g | 123 | Install | True Install | No offsetting removal within +- 8 weeks |
2024-06-16 | h | 123 | Removal | Replacement Removal | Offsets j install |
2024-06-16 | i | 123 | Removal | True Removal | No offsetting install within +- 8 weeks |
2024-08-01 | j | 123 | Install | Replacement Install | Offsets h removal |
The specific units that match up as part of the replacements don't really matter, I just need to be able to extract the number of true installs, replacements and true removals within a specific span of time for a customer. Each piece of activity can only be paired up with one opposite, though!
Thank you!
Upvotes: 0
Views: 96
Reputation: 448
Create two temp tables.
Put all installs in one table and all removals in the other.
Join both tables on unit and customer and the condition that the removal date is >= install date and removal date is < install date + 8 weeks.
From install as a outer join removal as b
on a.unit = b.unit and a.customer = b.customer and b.date >= a.date and b.date < dateadd(ww, 8, a.date)
Store the result in a temp table. Let us call it #Installs_Mapped_To_Removal
Split this temp table again in install and removal.
If an install has a removal within 8 weeks the removal columns will be filled. If not they are NULL. This way you know whether a replacement was done.
For removal it works the same. If install is filled there is a belonging install. If it is NULl there was none.
Look out for the possibility of two or more removals belonging to one install or two or more installs belonging to one removal. If you don't care about these instances filter them out with distinct because you will get duplicates in the new install and removal tables.
If you care about them you have to group within your table #Installs_Mapped_To_Removal and split afterwards. This way you can count or do String_Agg if needed.
Finally you will have the tables instals and removals filled with data about whether they belong to another install/removal.
Just union them and you have one wonderful table fitting to your needs.
If you provide me with the insert statement for your testdata I write the code and post it here. Then you can adapt it for your original data.
Edit: Ruthelessly going to steal from Alan Schofield.
Edit 2: Result below + Link --> https://dbfiddle.uk/OUHRaPE6
drop table if exists myData
CREATE TABLE myData (Week date, Unit varchar(1), Customer int, Activity varchar(10))
INSERT INTO myData VALUES
('2024-01-15', 'a', 123, 'Install'),
('2024-01-08', 'b', 123, 'Install'),
('2024-01-08', 'c', 123, 'Removal'),
('2024-01-15', 'd', 123, 'Removal'),
('2024-01-15', 'e', 123, 'Install'),
('2024-01-22', 'f', 123, 'Removal'),
('2024-07-22', 'h', 123, 'Install'),
('2024-05-27', 'g', 123, 'Removal'),
('2024-06-16', 'i', 123, 'Removal'),
('2024-08-01', 'j', 123, 'Install')
Select
Removals.Week as Removal_Week
, Removals.Unit as Removal_Unit
, Removals.Customer as Removal_Customer
, Removals.Activity as Removal_Activity
, case when Installs.Week is NULL then 'True Removal'
else 'Replacement Removal'
end as Removal_Type
, case when Installs.Week is NULL then 'No Install within 8 weeks afterwards.'
else 'Belongs to Unit ' + Installs.Unit
end as Removal_Explanation
, Installs.Week as Install_Week
, Installs.Unit as Install_Unit
, Installs.Customer as Install_Customer
, Installs.Activity as Install_Activity
, case when Removals.Week is NULL then 'True Install'
else 'Replacement Install'
end as Install_Type
, case when Removals.Week is NULL then 'No Removal within 8 weeks before.'
else 'Belongs to Unit ' + Removals.Unit
end as Install_Explanation
into #Installs_Mapped_To_Removal
From (select * From myData where Activity = 'Removal') as Removals
Full Outer Join (select * From myData where Activity = 'Install') as Installs
on Removals.Customer = Installs.Customer
and Removals.Week < Installs.Week
and Removals.Week >= dateadd(ww, -8, Installs.Week)
Select Removal_Week as Week
, Removal_Unit as Unit
, Removal_Customer as Customer
, Removal_Activity as Activity
, Removal_Type as Type
, Removal_Explanation as Explanation
into #Removals
From #Installs_Mapped_To_Removal
where Removal_Week is not NULL
Select Install_Week as Week
, Install_Unit as Unit
, Install_Customer as Customer
, Install_Activity as Activity
, Install_Type as Type
, Install_Explanation as Explanation
into #Installs
From #Installs_Mapped_To_Removal
where Install_Week is not NULL
Select *
From #Removals
Union all
Select *
From #Installs
Edit 3: For a view you will want to have all the code in one statement. Since this would have 3 or 4 levels of indents just replace the temp tables with CTEs. https://learn.microsoft.com/de-de/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver16
With the documentation and my code I trust you will figure it out ;)
Upvotes: 1