Reputation: 583
I have a table set up like this:
CREATE TABLE dbo.IntervalCounts (
item_id int NOT NULL,
interval_time time(0) NOT NULL,
interval_count int DEFAULT 0 NOT NULL
)
Each item_id has 96 interval_times, from 00:00 to 23:45 in 15 minute increments. each interval_time has an interval_count >= 0. This table has approx. 200 million rows.
I need to select values from a table where the count is the highest, then, if there are multiple qualifying rows with the same count, pick the one with the lowest interval time.
So, if I have a item_id 1, whose max count is 100:
item_id interval_time interval_count
1 00:00 100
1 13:15 100
1 07:45 100
1 19:30 100
I'd like to get just one row:
item_id interval_time interval_count
1 00:00 100
Getting the first selection is easy enough, I've got:
SELECT a.item_id, a.interval_time, a.interval_count
FROM dbo.IntervalCounts a
LEFT JOIN dbo.IntervalCounts b
ON a.item_id = b.item_id
AND a.interval_count < b.interval_count
WHERE 1=1
AND b.interval_count IS NULL
However, getting it down to just one row has proven tricky for me.
This triple self-join ran for an hour and a half before I killed it (I'll be running it regularly, ideally it would run no more than 15 minutes max).
SELECT a.item_id, a.interval_time, a.interval_count
FROM dbo.IntervalCounts a
LEFT JOIN dbo.IntervalCounts b
ON a.item_id = b.item_id
AND a.interval_count < b.interval_count
LEFT JOIN dbo.IntervalCounts c
ON a.item_id = c.item_id
-- if I remove this line, it will ALWAYS give me the 00:00 interval
-- if I keep it, it runs way too long
AND a.interval_count = c.interval_count
AND a.interval_time > c.interval_time
WHERE 1=1
AND b.interval_count IS NULL
AND c.interval_time IS NULL
Doing something like this just seems ungainly, and I was also forced to kill the execution after about an hour and a half:
DECLARE @tempTable TABLE
(
item_id int,
interval_time time(0),
interval_count int
)
INSERT INTO @tempTable
SELECT a.item_id, a.interval_time, a.interval_count
FROM dbo.IntervalCount a
LEFT JOIN dbo.IntervalCount b
ON a.item_id = b.item_id
AND a.interval_count < b.interval_count
WHERE 1=1
AND b.interval_count IS NULL
SELECT a.item_id, a.interval_time, a.interval_count
FROM @tempTable a
LEFT JOIN @tempTable b
ON a.item_id = b.item_id
AND a.interval_time > b.interval_time
WHERE 1=1
AND b.interval_time IS NULL
There must be a better way, but I'm stumped. How can I do this in a manner that won't take forever to run?
Upvotes: 2
Views: 61
Reputation: 70638
You are overthinking it, you can use ROW_NUMBER
:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY item_id
ORDER BY interval_count DESC, interval_time)
FROM dbo.IntervalCounts
)
SELECT *
FROM CTE
WHERE RN = 1;
Upvotes: 4