Chris G.
Chris G.

Reputation: 3981

TSQL GROUP BY without aggregating

Consider the following two tables:

key_names
----------------------------------------------
id | name
----------------------------------------------
1    Outside Temperature
2    Inside Temperature
3    Relative Humidity 

key_values
----------------------------------------------
id | time                    | key_id | value
----------------------------------------------
1    2013-06-04 13:20:16.347   1        50
2    2013-06-04 13:20:16.348   2        30
3    2013-06-04 13:20:16.349   3        10
4    2013-06-04 13:20:31.120   1        55
5    2013-06-04 13:20:31.122   2        29
6    2013-06-04 13:20:31.120   3        11

Values are written to the database at 15 second intervals for 3 different keys. I want to be able to pull out values at 1 minute, 5 minute, 1 hour, and other intervals.

This is what I have right now for a five minute interval:

SELECT time, key_name, value
FROM key_values kv
JOIN key_names kn ON kv.key_id = kn.id
WHERE (key_id = 1 OR key_id = 2 OR key_id = 3)
    AND time >= '2013-06-04 12:20:30' 
    AND time < '2013-06-04 13:20:30'
GROUP BY DATEPART(YEAR, time), DATEPART(MONTH, time), DATEPART(DAY, time),
    DATEPART(HOUR, time), (DATEPART(MINUTE, time) / 5), kv.key_id, kn.key_name
ORDER BY kv.time DESC

Of course, this query is going to give me the following error:

Column 'key_values.time' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

But I don't want any aggregate data! I just want the actual values for the rows at the five minute interval. How can I fix this query?

Expected Result Set: (for 5 minute interval)

----------------------------------------------------
time                | key_id | value
----------------------------------------------------
2013-06-04 13:20:16   1        50
2013-06-04 13:20:16   2        30
2013-06-04 13:20:16   3        10
2013-06-04 13:15:16   1        ...
2013-06-04 13:15:16   2        ...
2013-06-04 13:15:16   3        ...
2013-06-04 13:10:16   1        ...
2013-06-04 13:10:16   2        ...
2013-06-04 13:10:16   3        ...
...                   ...      ...

For one of the guys answering my question:

SELECT time, key_name, value
FROM (
        SELECT time, key_name, value, ROW_NUMBER() over (partition by convert(varchar(16), time, 121) order by time desc) as seqnum
        FROM key_values kv
        JOIN key_names kn ON kn.id = kv.tag_id
        WHERE (key_id = 1 OR key_id = 2 )
                AND time >= '2013-06-04 12:20:30' AND time < '2013-06-04 13:20:30'
) k
where seqnum = 1;


Its only returning one of the keys..

2013-06-04 12:20:59.577 Outside Temperature     45
2013-06-04 12:21:59.607 Outside Temperature     45
2013-06-04 12:22:59.637 Outside Temperature     45
2013-06-04 12:23:59.687 Outside Temperature     45
2013-06-04 12:24:59.697 Outside Temperature     46
2013-06-04 12:25:59.723 Outside Temperature     46

Upvotes: 1

Views: 293

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271171

You want to use the row_number() function to enumerate values in each interval. Then, just take the last one or the first one -- depending on whether you want the last just before the interval or the first before it.

Here is an example for the 1-minute interval:

select time, key_name, value
from (SELECT time, key_name, value,
             ROW_NUMBER() over (partition by key_name, convert(varchar(16), time, 121) order by time desc) as seqnum
      FROM key_values kv join
           key_names kn
           ON kv.key_id = kn.id
     ) k
where seqnum = 1;

The expression convert(varchar(16), time, 121) is formatting the date to the nearest minute (using truncation).

EDIT: Other increment. The following example show how to do this for spans of 15 minutes:

(partition by key_name, datediff(minute, 0, time)/15 order by . . . )

For 5 minute increments use 5 instead of 15 and so on. This is calculating the number of minutes since the beginning of time (database-ly speaking). SQL Server does integer division, so dividing by the integer creates an identifier for the time span. For other units, you can replace the minute with hour, second, day, month, or year.

For the original query, I figured that convert() was easier to understand.

Upvotes: 5

Related Questions