Reputation: 573

SQL Server complex aggregate filtering

I'm attempting to optimize some queries that run against a large amount of data. I'll try simplifying the problem here. Let's start with an example table:

CREATE TABLE [dbo].[TestTable]
(
    [ProjectID] [INT] NOT NULL,
    [Index] [INT] NOT NULL,
    [Voltage] [DECIMAL](18, 3) NOT NULL,
    [Current] [DECIMAL](18, 3) NOT NULL
)

And imagine we have the following data:

ProjectID   Index   Voltage     Current
---------------------------------------
1           1       2.3         3.4 
1           2       2.5         3.3
1           3       2.7         3.0
1           4       2.8         2.9
1           5       2.5         3.1
1           6       2.0         3.4
1           7       1.2         3.5
1           8       0.5         3.0
2           1       2.0         1.0
2           2       5.0         2.0
2           3       3.0         2.0
2           4       1.0         1.0

My goal is actually to do some aggregates between a starting point and an ending point ordered by the index column. When I mean starting and ending points, I mean, for example, start at the first row that has Voltage >= 2.5 and then continue on until I hit the last row that has Voltage >= 1.5

Here's an example query to illustrate:

WITH CTE AS
(
    SELECT
        StartingTable.ProjectID,
        MIN(StartingTable.[Index]) StartingIndex,
        MIN(EndingTable.[Index]) - 1 EndingIndex
    FROM
        TestTable StartingTable
        JOIN TestTable EndingTable ON StartingTable.ProjectID = EndingTable.ProjectID
            AND EndingTable.[Index] > StartingTable.[Index]
    WHERE
        StartingTable.Voltage >= 2.5
        and EndingTable.Voltage <= 1.5
    GROUP BY
        StartingTable.ProjectID
)
SELECT
    TestTable.ProjectID,
    MAX(Voltage) MaxVoltage,
    StartingIndex,
    EndingIndex
FROM
    TestTable
    JOIN CTE ON TestTable.ProjectID = CTE.ProjectID
        AND TestTable.[Index] >= StartingIndex
        AND TestTable.[Index] <= EndingIndex
GROUP BY
    TestTable.ProjectID,
    StartingIndex,
    EndingIndex

And in the example, it should be returning:

ProjectID MaxVoltage StartingIndex EndingIndex
1         2.800      2             6
2         5.000      2             3

That works ok, but I really don't like joining TestTable twice in order to get starting and ending index. We're dealing with a table that I think may eventually end up with terabytes worth of data, so I think this is a poor choice. I just don't know what else to do.

I was thinking about some way to use windowed functions, but I'm not sure it's even possible. It's almost like I want to do this:

MAX(Voltage) OVER (PARTITION BY ProjectID ORDER BY [Index] ROWS BETWEEN Voltage >= 2.5 AND Voltage >= 1.5)

I haven't seen where anything like that's possible. I also came up with the following:

WITH CTE AS
(
    SELECT
        ProjectID,
        [Index],
        MAX(Voltage) OVER (PARTITION BY ProjectId ORDER BY [Index] ROWS UNBOUNDED PRECEDING) MaxVoltage
    FROM
        TestTable
)
SELECT
    TestTable.ProjectID,
    MAX(Voltage) MaxVoltage,
    MIN(TestTable.[Index]) StartingIndex,
    MAX(TestTable.[Index]) EndingIndex
FROM
    TestTable
    JOIN CTE ON TestTable.ProjectID = CTE.ProjectID
        AND TestTable.[Index] = CTE.[Index]
WHERE
    MaxVoltage >= 2.5
    AND Voltage >= 1.5
GROUP BY
    TestTable.ProjectID

I'm not sure that is much better. Are there any better alternatives than what I'm already trying?

Upvotes: 2

Answers (3)

dnoeth

Reputation: 60513

You can apply conditional aggregation if the voltage never goes above 2.5, then below 1.5 and then increases above 1.5 again:

SELECT
   ProjectID,
   max(Voltage) as MaxVoltage,
   MIN(case when Voltage >= 2.5 then [index] end) AS StartingIndex,
   MAX(case when Voltage >= 1.5 then [index] end) AS EndingIndex
FROM TestTable
group by ProjectID
having MAX(Voltage) >= 2.5 -- to filter group which never reached 2.5

See rextester fiddle

Edit:

If your Voltage has repeated groups between 2.5 and 1.5 query #2 by @Clockwork-Muse will work fine as long as there's no gap in the [index] columns, otherwise it will split one result row into two groups. If you want to ignore gaps following Select will return the expected result:

with cte as 
(
   SELECT
      ProjectID,
      [Index],
      Voltage,
      max(case when Voltage < 1.5 then [Index] end)
      over (partition by ProjectID
            order by [Index]
            rows unbounded preceding) AS grp -- same value for a range of rows >= 1.5
   FROM TestTable
 )
select
   ProjectID,
   max(Voltage) as MaxVoltage,
   MIN(case when Voltage >= 2.5 then [index] end) AS StartingIndex,
   MAX([index]) AS EndingIndex
from cte
where Voltage >=1.5
group by ProjectID, grp
having MAX(Voltage) >= 2.5 -- to filter group which never reached 2.5
order by ProjectID, grp
;

This groups contiguous rows with Voltage >= 1.5 and starts a new group whenever it drops below 1.5, see Clockwork-Muse's modified db<>fiddle

Upvotes: 2

Clockwork-Muse

Reputation: 13106

If, as in your example dataset, voltages only decrease after hitting 1.5 volts (and never repeat), we can cheat via use of a conditional aggregate:

SELECT [ProjectID], MAX([Voltage]) AS MaxVoltage, 
       MIN(CASE WHEN [Voltage] >= 2.5 THEN [Index] END) AS [StartingIndex],
       MAX(CASE WHEN [Voltage] >= 1.5 THEN [Index] END) AS [EndingIndex]
FROM [dbo].[TestTable]
WHERE [Voltage] >= 1.5
GROUP BY [ProjectId]
HAVING MAX([Voltage]) >= 2.5

Example Fiddle
Which yields the requested:

ProjectID | MaxVoltage | StartingIndex | EndingIndex
--------: | :--------- | ------------: | ----------:
        1 | 2.800      |             2 |           6
        2 | 5.000      |             2 |           3

If, on the other hand, we need to be wary of restarts, things get more complicated, and we need to turn it into a variation on a gaps-and-islands solution:

SELECT [ProjectID], MAX([Voltage]) AS [MaxVoltage],
       MIN(CASE WHEN [Voltage] >= 2.5 THEN [Index] END) AS [StartingIndex],
       MAX(CASE WHEN [Voltage] >= 1.5 THEN [Index] END) AS [EndingIndex]
FROM (SELECT [ProjectId], [Index], [Voltage], 
             [Index] - ROW_NUMBER() OVER(PARTITION BY [ProjectID] ORDER BY [Index]) AS [VoltageRun]
      FROM [dbo].[TestTable]       
      WHERE [Voltage] >= 1.5) [TestTable]
GROUP BY [ProjectID], [VoltageRun]
HAVING MAX([Voltage]) >= 2.5
ORDER BY [ProjectID], [VoltageRun]

Example Fiddle

This works because your table handily stores the (hopefully gapless) [Index] column. By only selecting rows that are valid at all (>= 1.5), the ROW_NUMBER() subtraction gains us a "grouping column" - before the aggregation, the result set looks like this:

ProjectId | Index | Voltage | VoltageRun
--------: | ----: | :------ | :---------
        1 |     1 | 2.300   | 0         
        1 |     2 | 2.500   | 0         
        1 |     3 | 2.700   | 0         
        1 |     4 | 2.800   | 0         
        1 |     5 | 2.500   | 0         
        1 |     6 | 2.000   | 0         
        1 |     9 | 2.300   | 2         
        1 |    10 | 2.500   | 2         
        1 |    11 | 2.700   | 2         
        1 |    12 | 2.800   | 2         
        1 |    13 | 2.500   | 2         
        1 |    14 | 2.000   | 2         
        2 |     1 | 2.000   | 0         
        2 |     2 | 5.000   | 0         
        2 |     3 | 3.000   | 0

(The test data for [ProjectID]=1 has been repeated)

After that, we just need to include the grouping column as an extra qualifier in the original query.
(Note that this type of query is one of the few times it makes sense to leave the grouping column out of the SELECT list)

Upvotes: 0

Indominus

Reputation: 1258

SELECT tt.ProjectID, 
       MAX(tt.Voltage) AS MaxVoltage,
       x.StartIndex,
       MAX(tt.[Index]) AS EndIndex
FROM TestTable AS tt
JOIN 
(  
    SELECT ProjectID, 
           MIN([Index]) AS StartIndex
    FROM TestTable
    WHERE Voltage >= 2.5
    GROUP BY ProjectID 
) AS x ON tt.ProjectID = x.ProjectID 
WHERE tt.Voltage >= 1.5 
  AND tt.[Index] >= x.StartIndex
GROUP BY tt.ProjectID, x.StartIndex

See the full test here: https://rextester.com/BCVL10968

Upvotes: 0

SQL Server complex aggregate filtering

Answers (3)

Related Questions