Reputation: 823
One of my friends had this Q to me and I am too puzzled.
His team is loading a DW and the data keeps coming in incremental and full load fashion on an adhoc basic. Now there is identifier flag that says as to when the full load has started or stopped. Now we need to collect and then segregate all full load.
For ex:
create table #tmp (
id int identity(1,1) not null,
name varchar(30) null,
val int null
)
insert into #tmp (name, val) select 'detroit', 3
insert into #tmp (name, val) select 'california', 9
insert into #tmp (name, val) select 'houston', 1
insert into #tmp (name, val) select 'los angeles', 4
insert into #tmp (name, val) select 'newyork', 8
insert into #tmp (name, val) select 'chicago', 1
insert into #tmp (name, val) select 'seattle', 9
insert into #tmp (name, val) select 'michigan', 6
insert into #tmp (name, val) select 'atlanta', 9
insert into #tmp (name, val) select 'philly', 6
insert into #tmp (name, val) select 'brooklyn', 8
drop table #tmp
The rule is:
whenever val is 9, the full load starts; whenever val is 8, the full load stops; (or when whenever next val is 8, full load stops).
In this case, for full load, I should only collect these records:
id name val
3 houston 1
4 los angeles 4
10 philly 6
My Approach so far:
;with mycte as (
select id, name, val, row_number() over (order by id) as rnkst
from #tmp
where val in (8,9))
SELECT *
FROM mycte y
WHERE val = 9
AND Exists (
SELECT *
FROM mycte x
WHERE x.id =
----> this gives start 9 record but not stop record of 8
(SELECT MIN(id)
FROM mycte z
WHERE z.id > y.id)
AND val = 8)
I do not want to venture into cursor within cursor approach but with a CTE , please enlighten!
UPDATE:
As mentioned by one of the answerers I am restating the rules.
--> the full load records start coming AFTER 9. (9th records are NOT included)
--> the full load continues till it sees immediate 8.
--> So effectively all records BETWEEN 9 and 8 form small chunks of full load
--> An individual 9th record itself does not get considered as it has no 8 as partner
--> The result set shown below satisfies these conditions
Upvotes: 0
Views: 200
Reputation: 77677
I am not sure if my command of English will allow me to explain my approach fully, but I'll try, just in case it can help.
Rank all the rows and rank the bounds (val IN (8, 9)
) separately.
Join the subset where val = 8
with the subset where val = 9
on the condition that the bound ranking of the former should be exactly 1 (one) greater than that of the latter.
Join the subset of non-(8, 9)
rows to the result set of Step 2 on the condition that the (general) ranking should be between the ranking of the val = 9
subset and that of the val = 8
one.
Here's the query to illustrate my attempt at verbal description:
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (ORDER BY id),
bound_rnk = ROW_NUMBER() OVER (
PARTITION BY CASE WHEN val IN (8, 9) THEN 1 ELSE 2 END
ORDER BY id
)
FROM #tmp
)
SELECT
load.id,
load.name,
load.val
FROM ranked AS eight
INNER JOIN ranked AS nine ON eight.bound_rnk = nine.bound_rnk + 1
INNER JOIN ranked AS load ON load.rnk BETWEEN nine.rnk AND eight.rnk
WHERE eight.val = 8
AND nine .val = 9
AND load .val NOT IN (8, 9)
;
And you might not believe me but, when I tested it, it did return the following:
id name val
-- ----------- ---
3 houston 1
4 los angeles 4
10 philly 6
Upvotes: 1
Reputation: 67075
I do not believe there is a way to do this without a while loop or possibly a recursive cte that is going to be complex. So, my question would be if this is at all possible to accomplish in code? SQL is not as strong as a procedural language, so code would handle this better. If this is not an option, then I would go with a while loop (MUCH better than a cursor). I will create the SQL for this shortly.
/*
drop table #tmp
drop table #finalTmp
drop table #startStop
*/
create table #tmp (
id int identity(1,1) not null,
name varchar(30) null,
val int null
)
insert into #tmp (name, val) select 'detroit', 3
insert into #tmp (name, val) select 'california', 9
insert into #tmp (name, val) select 'houston', 1
insert into #tmp (name, val) select 'los angeles', 4
insert into #tmp (name, val) select 'newyork', 8
insert into #tmp (name, val) select 'chicago', 1
insert into #tmp (name, val) select 'seattle', 9
insert into #tmp (name, val) select 'michigan', 6
insert into #tmp (name, val) select 'atlanta', 9
insert into #tmp (name, val) select 'philly', 6
insert into #tmp (name, val) select 'brooklyn', 8
CREATE TABLE #Finaltmp
(
id INT,
name VARCHAR(30),
val INT
)
SELECT id, val, 0 AS Checked
INTO #StartStop
FROM #tmp
WHERE val IN (8,9)
DECLARE @StartId INT, @StopId INT
WHILE EXISTS (SELECT 1 FROM #StartStop WHERE Checked = 0)
BEGIN
SELECT TOP 1 @StopId = id
FROM #StartStop
WHERE EXISTS
--This makes sure we grab a stop that has a start before it
(
SELECT 1
FROM #StartStop AS TestCheck
WHERE TestCheck.id < #StartStop.id AND val = 9
)
AND Checked = 0 AND val = 8
ORDER BY id
--If no more starts, then the rest are stops
IF @StopId IS NULL
BREAK
SELECT TOP 1 @StartId = id
FROM #StartStop
WHERE Checked = 0 AND val = 9
--Make sure we only pick up the 9 that matches
AND Id < @StopId
ORDER BY Id DESC
IF @StartId IS NULL
BREAK
INSERT INTO #Finaltmp
SELECT *
FROM #tmp
WHERE id BETWEEN @StartId AND @StopId
AND val NOT IN (8,9)
--Make sure to "check" any values that fell in the middle (double 9's)
--If not, then you would start picking up overlap data
UPDATE #StartStop
SET Checked = 1
WHERE id <= @StopId
END
SELECT * FROM #Finaltmp
I noticed that the data looked a little wonky, so I tried to put some edge case checks and comments about them
Upvotes: 0