Reputation: 573
I'm attempting to optimize some queries that run against a large amount of data. I'll try simplifying the problem here. Let's start with an example table:
CREATE TABLE [dbo].[TestTable]
(
[ProjectID] [INT] NOT NULL,
[Index] [INT] NOT NULL,
[Voltage] [DECIMAL](18, 3) NOT NULL,
[Current] [DECIMAL](18, 3) NOT NULL
)
And imagine we have the following data:
ProjectID Index Voltage Current
---------------------------------------
1 1 2.3 3.4
1 2 2.5 3.3
1 3 2.7 3.0
1 4 2.8 2.9
1 5 2.5 3.1
1 6 2.0 3.4
1 7 1.2 3.5
1 8 0.5 3.0
2 1 2.0 1.0
2 2 5.0 2.0
2 3 3.0 2.0
2 4 1.0 1.0
My goal is actually to do some aggregates between a starting point and an ending point ordered by the index column. When I mean starting and ending points, I mean, for example, start at the first row that has Voltage >= 2.5 and then continue on until I hit the last row that has Voltage >= 1.5
Here's an example query to illustrate:
WITH CTE AS
(
SELECT
StartingTable.ProjectID,
MIN(StartingTable.[Index]) StartingIndex,
MIN(EndingTable.[Index]) - 1 EndingIndex
FROM
TestTable StartingTable
JOIN TestTable EndingTable ON StartingTable.ProjectID = EndingTable.ProjectID
AND EndingTable.[Index] > StartingTable.[Index]
WHERE
StartingTable.Voltage >= 2.5
and EndingTable.Voltage <= 1.5
GROUP BY
StartingTable.ProjectID
)
SELECT
TestTable.ProjectID,
MAX(Voltage) MaxVoltage,
StartingIndex,
EndingIndex
FROM
TestTable
JOIN CTE ON TestTable.ProjectID = CTE.ProjectID
AND TestTable.[Index] >= StartingIndex
AND TestTable.[Index] <= EndingIndex
GROUP BY
TestTable.ProjectID,
StartingIndex,
EndingIndex
And in the example, it should be returning:
ProjectID MaxVoltage StartingIndex EndingIndex
1 2.800 2 6
2 5.000 2 3
That works ok, but I really don't like joining TestTable twice in order to get starting and ending index. We're dealing with a table that I think may eventually end up with terabytes worth of data, so I think this is a poor choice. I just don't know what else to do.
I was thinking about some way to use windowed functions, but I'm not sure it's even possible. It's almost like I want to do this:
MAX(Voltage) OVER (PARTITION BY ProjectID ORDER BY [Index] ROWS BETWEEN Voltage >= 2.5 AND Voltage >= 1.5)
I haven't seen where anything like that's possible. I also came up with the following:
WITH CTE AS
(
SELECT
ProjectID,
[Index],
MAX(Voltage) OVER (PARTITION BY ProjectId ORDER BY [Index] ROWS UNBOUNDED PRECEDING) MaxVoltage
FROM
TestTable
)
SELECT
TestTable.ProjectID,
MAX(Voltage) MaxVoltage,
MIN(TestTable.[Index]) StartingIndex,
MAX(TestTable.[Index]) EndingIndex
FROM
TestTable
JOIN CTE ON TestTable.ProjectID = CTE.ProjectID
AND TestTable.[Index] = CTE.[Index]
WHERE
MaxVoltage >= 2.5
AND Voltage >= 1.5
GROUP BY
TestTable.ProjectID
I'm not sure that is much better. Are there any better alternatives than what I'm already trying?
Upvotes: 2
Views: 318
Reputation: 60513
You can apply conditional aggregation if the voltage never goes above 2.5, then below 1.5 and then increases above 1.5 again:
SELECT
ProjectID,
max(Voltage) as MaxVoltage,
MIN(case when Voltage >= 2.5 then [index] end) AS StartingIndex,
MAX(case when Voltage >= 1.5 then [index] end) AS EndingIndex
FROM TestTable
group by ProjectID
having MAX(Voltage) >= 2.5 -- to filter group which never reached 2.5
See rextester fiddle
Edit:
If your Voltage has repeated groups between 2.5 and 1.5 query #2 by @Clockwork-Muse will work fine as long as there's no gap in the [index]
columns, otherwise it will split one result row into two groups. If you want to ignore gaps following Select will return the expected result:
with cte as
(
SELECT
ProjectID,
[Index],
Voltage,
max(case when Voltage < 1.5 then [Index] end)
over (partition by ProjectID
order by [Index]
rows unbounded preceding) AS grp -- same value for a range of rows >= 1.5
FROM TestTable
)
select
ProjectID,
max(Voltage) as MaxVoltage,
MIN(case when Voltage >= 2.5 then [index] end) AS StartingIndex,
MAX([index]) AS EndingIndex
from cte
where Voltage >=1.5
group by ProjectID, grp
having MAX(Voltage) >= 2.5 -- to filter group which never reached 2.5
order by ProjectID, grp
;
This groups contiguous rows with Voltage >= 1.5
and starts a new group whenever it drops below 1.5, see Clockwork-Muse's modified db<>fiddle
Upvotes: 2
Reputation: 13106
If, as in your example dataset, voltages only decrease after hitting 1.5 volts (and never repeat), we can cheat via use of a conditional aggregate:
SELECT [ProjectID], MAX([Voltage]) AS MaxVoltage,
MIN(CASE WHEN [Voltage] >= 2.5 THEN [Index] END) AS [StartingIndex],
MAX(CASE WHEN [Voltage] >= 1.5 THEN [Index] END) AS [EndingIndex]
FROM [dbo].[TestTable]
WHERE [Voltage] >= 1.5
GROUP BY [ProjectId]
HAVING MAX([Voltage]) >= 2.5
Example Fiddle
Which yields the requested:
ProjectID | MaxVoltage | StartingIndex | EndingIndex
--------: | :--------- | ------------: | ----------:
1 | 2.800 | 2 | 6
2 | 5.000 | 2 | 3
If, on the other hand, we need to be wary of restarts, things get more complicated, and we need to turn it into a variation on a gaps-and-islands solution:
SELECT [ProjectID], MAX([Voltage]) AS [MaxVoltage],
MIN(CASE WHEN [Voltage] >= 2.5 THEN [Index] END) AS [StartingIndex],
MAX(CASE WHEN [Voltage] >= 1.5 THEN [Index] END) AS [EndingIndex]
FROM (SELECT [ProjectId], [Index], [Voltage],
[Index] - ROW_NUMBER() OVER(PARTITION BY [ProjectID] ORDER BY [Index]) AS [VoltageRun]
FROM [dbo].[TestTable]
WHERE [Voltage] >= 1.5) [TestTable]
GROUP BY [ProjectID], [VoltageRun]
HAVING MAX([Voltage]) >= 2.5
ORDER BY [ProjectID], [VoltageRun]
This works because your table handily stores the (hopefully gapless) [Index]
column. By only selecting rows that are valid at all (>= 1.5
), the ROW_NUMBER()
subtraction gains us a "grouping column" - before the aggregation, the result set looks like this:
ProjectId | Index | Voltage | VoltageRun
--------: | ----: | :------ | :---------
1 | 1 | 2.300 | 0
1 | 2 | 2.500 | 0
1 | 3 | 2.700 | 0
1 | 4 | 2.800 | 0
1 | 5 | 2.500 | 0
1 | 6 | 2.000 | 0
1 | 9 | 2.300 | 2
1 | 10 | 2.500 | 2
1 | 11 | 2.700 | 2
1 | 12 | 2.800 | 2
1 | 13 | 2.500 | 2
1 | 14 | 2.000 | 2
2 | 1 | 2.000 | 0
2 | 2 | 5.000 | 0
2 | 3 | 3.000 | 0
(The test data for [ProjectID]=1
has been repeated)
After that, we just need to include the grouping column as an extra qualifier in the original query.
(Note that this type of query is one of the few times it makes sense to leave the grouping column out of the SELECT
list)
Upvotes: 0
Reputation: 1258
SELECT tt.ProjectID,
MAX(tt.Voltage) AS MaxVoltage,
x.StartIndex,
MAX(tt.[Index]) AS EndIndex
FROM TestTable AS tt
JOIN
(
SELECT ProjectID,
MIN([Index]) AS StartIndex
FROM TestTable
WHERE Voltage >= 2.5
GROUP BY ProjectID
) AS x ON tt.ProjectID = x.ProjectID
WHERE tt.Voltage >= 1.5
AND tt.[Index] >= x.StartIndex
GROUP BY tt.ProjectID, x.StartIndex
See the full test here: https://rextester.com/BCVL10968
Upvotes: 0