Reputation: 256741
I have a table of values that some need attention:
| ID | AddedDate |
|---------|-------------|
| 1 | 2010-04-01 |
| 2 | 2010-04-01 |
| 3 | 2010-04-02 |
| 4 | 2010-04-02 |
| 5 | NULL | <----------- needs attention
| 6 | 2010-04-02 |
| 7 | 2010-04-03 |
| 8 | 2010-04-04 |
| 9 | 2010-04-04 |
| 2432659 | 2016-06-15 |
| 2432650 | 2016-06-16 |
| 2432651 | 2016-06-17 |
| 2432672 | 2016-06-18 |
| 2432673 | NULL | <----------- needs attention
| 2432674 | 2016-06-20 |
| 2432685 | 2016-06-21 |
I want to select the rows where AddedDate
is null, and i want to select rows around it. In this example question it would be sufficient to say rows where the ID
is ±3. This means i want:
| ID | AddedDate |
|---------|-------------|
| 2 | 2010-04-01 | ─╮
| 3 | 2010-04-02 | │
| 4 | 2010-04-02 | │
| 5 | NULL | ├──ID values ±3
| 6 | 2010-04-02 | │
| 7 | 2010-04-03 | │
| 8 | 2010-04-04 | ─╯
| 2432672 | 2016-06-18 | ─╮
| 2432673 | NULL | ├──ID values ±3
| 2432674 | 2016-06-20 | ─╯
Note: In reality it's a table of 9M rows, and 15k need attention.
First i create a query that builds the ranges i'm interested in returning:
SELECT
ID-3 AS [Low ID],
ID+3 AS [High ID]
FROM Items
WHERE AddedDate IS NULL
Low ID High ID
------- -------
2 8
2432670 2432676
So my initial attempt to use this does work:
WITH dt AS (
SELECT ID-3 AS Low, ID+3 AS High
FROM Items
WHERE AddedDate IS NULL
)
SELECT * FROM Items
WHERE EXISTS(
SELECT 1 FROM dt
WHERE Items.ID BETWEEN dt.Low AND dt.High)
But when i try it on real data:
There's probably a more efficient way.
Upvotes: 18
Views: 3197
Reputation: 1133
Just to try a different approach than other answers... How about using a table variable to store the ids you want. Then you join. My hope is that the insert executes fast enough, and then the SELECT can take advantage of the clustered index in Items. Unfortunately I don't have here your amount of data to test its efficiency:
DECLARE @userData TABLE(
idInRange int NOT NULL
)
INSERT INTO @userData (idInRange)
SELECT DISTINCT i.Id + r
FROM Items i
CROSS JOIN (
SELECT -3 as r UNION ALL SELECT -2 as r UNION ALL SELECT -1 as r UNION ALL SELECT 0 as r UNION ALL
SELECT 1 as r UNION ALL SELECT 2 as r UNION ALL SELECT 3 as r
) yourRange
WHERE AddedDate IS NULL;
SELECT i.*
FROM @userData u
INNER JOIN Items i ON i.ID = u.idInRange
Edited to add a DISTINCT when filling the table variable, to avoid duplicated rows just in case there are two contiguous NULL dates and their id ranges overlap
Upvotes: 1
Reputation: 60472
This is your existing logic rewritten using an moving max:
WITH dt AS (
SELECT
ID, AddedDate,
-- check if there's a NULL within a range of +/- 3 rows
-- and remember it's ID
max(case when AddedDate is null then id end)
over (order by id
rows between 3 preceding and 3 following) as NullID
FROM Items
)
SELECT *
FROM dt
where id between NullID-3 and NullID+3
Upvotes: 4
Reputation: 6563
Another way:
SELECT i1.*
FROM Items i1, Items i2
WHERE i2.AddedDate IS NULL AND ABS(i1.ID - i2.ID) <= 3
I hope there is index on AddedDate
column.
Upvotes: 3
Reputation: 1269913
Here is one method that uses the windowing clause:
select i.*
from (select i.*,
count(*) over (order by id rows between 3 preceding and 1 preceding) as cnt_prec,
count(*) over (order by id rows between 1 following and 3 following) as cnt_foll,
count(addeddate) over (order by id rows between 3 preceding and 1 preceding) as cnt_ad_prec,
count(addeddate) over (order by id rows between 1 following and 3 following) as cnt_ad_foll
from items
) i
where cnt_ad_prec <> cnt_prec or
cnt_ad_foll <> cnt_foll or
addeddate is null;
order by id;
This returns all rows that have NULL
in the column or are within three rows of a NULL
.
The need for the comparison to the count is to avoid the edge issues on the smallest and largest ids.
Upvotes: 4