Reputation: 13
I have data with a few columns -- Date, GroupID , PersonID, and Value. There are multiple GroupIDs on each date; there are multiple PersonIDs in each group, and there are multiple Values for each person, in each group, on each day. What I would like to do is summarize these, by finding the average Value and 90th percentile Value for each person in each group on each day. So every row would have a unique Date-GroupID-PersonID identifier, and then would have the average and 90th percentile Value associated with that combination.
This works for just running the average, or just running the percentile, but when I need to include both, is where I run into issues. This is currently the code I am using:
select distinct GroupID
, PersonID
, Date
, avg(Value) as Value_Avg
, PERCENTILE_CONT(0.9) within group (order by Value) over (partition by GroupID, PersonID) as Value_90th
from Table
group by Date, GroupID, PersonID
The error it gives me is: "Column 'Values' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
If I add Value in the group by clause, it gives me the 90th percentile correctly, but leaves each Value in its own row and doesn't average them under the average column. If I add an aggregate function inside the Percentile function, such as "order by avg(Value)" it correctly averages the Values in the average column, but also just gives me the average in the Percentile column, not the 90th percentile. I tried using other aggregates, such as min or max, and it would give me the min or max value instead of the percentile as well. Additionally, the datatype of the column Value is decimal, which the internet tells me shouldn't require an aggregate, but alas, it does.
So, I'm in a bit of a pickle... either way I try to resolve it doesn't get me what I want. Any help or insight would be appreciated!
Upvotes: 1
Views: 7591
Reputation: 88971
The tricky thing here is that the analytic functions, like PERCENTILE_CONT, aren't aggregates.
so this
use tempdb
go
drop table if exists t
go
create table t(GroupID int, PersonId int, Date date, Value float)
insert into t(GroupID, PersonId,Date, Value)
values
(1,1,'20190829',10),
(1,1,'20190829',20),
(1,1,'20190829',30),
(1,1,'20190829',40),
(1,1,'20190829',50),
(1,2,'20190829',20),
(1,2,'20190829',40),
(1,2,'20190829',60),
(1,2,'20190829',80),
(1,2,'20190829',100)
go
select GroupID
, PersonID
, Date
, PERCENTILE_CONT(0.9) within group (order by Value) over (partition by Date, GroupID, PersonID) as Value_90th
from t
outputs
GroupID PersonID Date Value_90th
----------- ----------- ---------- ----------------------
1 1 2019-08-29 46
1 1 2019-08-29 46
1 1 2019-08-29 46
1 1 2019-08-29 46
1 1 2019-08-29 46
1 2 2019-08-29 92
1 2 2019-08-29 92
1 2 2019-08-29 92
1 2 2019-08-29 92
1 2 2019-08-29 92
(10 rows affected)
So if you then want to aggregate, you then have apply an aggregate function or group by to that column. Like this:
select GroupID
, PersonID
, Date
, avg(PERCENTILE_CONT(0.9) within group (order by Value) over (partition by Date, GroupID, PersonID)) as Value_90th
, avg(Value) as Value_Avg
from t
group by Date, GroupID, PersonID
Which happens to be illegal, and fails with
Msg 8120, Level 16, State 1, Line 28
Column 't.Value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Msg 4109, Level 15, State 1, Line 28
Windowed functions cannot be used in the context of another windowed function or aggregate.
So, like with other analytic functions, you apply them in a subquery or CTE, like this:
with q as
(
select GroupID
, PersonID
, Date
, Value
, PERCENTILE_CONT(0.9) within group (order by Value) over (partition by Date, GroupID, PersonID) as Value_90th
from t
)
select Date, GroupID, PersonID, avg(value) avg_value, avg(Value_90th) Value_90th
from q
group by Date, GroupID, PersonID
Which outputs
Date GroupID PersonID avg_value Value_90th
---------- ----------- ----------- ---------------------- ----------------------
2019-08-29 1 1 30 46
2019-08-29 1 2 60 92
(2 rows affected)
Upvotes: 2