Reputation: 51200

Summarize data that has already been grouped

I have a data set that looks like this:

  User  |  Task  |  Time
--------|--------|--------
 User A | Task X |  100
 User A | Task Y |  200
 User A | Task Z |  300
 User B | Task X |  400
 User B | Task Y |  500
 User B | Task Z |  600
 User C | Task X |  700
 User C | Task Y |  800
 User C | Task Z |  900
 User D | Task X | 1000
 User D | Task Y | 1100
 user D | Task Z | 1200

When I do my initial grouping, the data looks like this:

       | Avg User | Avg Task X | Avg Task Y | Avg Task Z
  User |   Time   |    Time    |    Time    |    Time   
-------|----------|------------|------------|------------
User A |    200   |    100     |    200     |    300
User B |    500   |    400     |    500     |    600
User C |    800   |    700     |    800     |    900
User D |   1100   |   1000     |   1100     |   1200

I need it to look like this:

      | Avg User | Avg Task X | Avg Task Y | Avg Task Z
 User |   Time   |    Time    |    Time    |    Time   
------|----------|------------|------------|------------
 All  |    650   |    550     |    650     |    750

This is how I got those numbers:

 650 = (200+500+800+1100) / 4
 550 = (100+400+700+1000) / 4
 650 = (200+500+800+1100) / 4
 750 = (300+600+900+1200) / 4

In other words, I have a column group on Task and a row group on User. The problem is that I want the row group to get summarized an extra time.

At first glance I could just return the user's name back as 'All' and it would summarize but this doesn't actually give me the averages that I need. I need to first SUM the times by user, and then find the average per user. If I change the way the original data is shaped, my task groups will no longer work properly.

If I try to use a "Totals" row on my row group, it aggregates the ORIGINAL data and not the summarized/grouped data. That is rather disappointing because it is actually incorrect in my eyes.

Upvotes: 3

Answers (4)

DRapp

Reputation: 48169

Here's the query I would write that works... The "PreQuery" is done to group the counts and sum of each element for a given user... Then that is rolled-up to the top-most level of "All". Now, this is based on your data sample.

SELECT
    AVG( TaskTime / TaskCount ) as TaskAvg,
    SUM( XTime ) / SUM( XCount ) as XAvg,
    SUM( YTime ) / SUM( YCount ) as YAvg,
    SUM( ZTime ) / SUM( ZCount ) as ZAvg
   from 
      ( SELECT 
              user,
              COUNT(*) as TaskCount,
              SUM( Time ) as TaskTime,
              CASE WHEN Task = "Task X" THEN 1 ELSE 0 END as XCount,
              CASE WHEN Task = "Task X" THEN Time ELSE 0 END as XTime,
              CASE WHEN Task = "Task Y" THEN 1 ELSE 0 END as YCount,
              CASE WHEN Task = "Task Y" THEN Time ELSE 0 END as YTime,
              CASE WHEN Task = "Task Z" THEN 1 ELSE 0 END as ZCount,
              CASE WHEN Task = "Task Z" THEN Time ELSE 0 END as ZTime
           FROM 
              AllUsersTasks
           group by ;
              user ) PreQuery

If your data could provide that a given user has multiple entries for a single Task, such as 3 entries for User A, Task X has Times of 95, 100 and 105, you have 3 entries for 300 which results in the 100. This could skew your OVERALL Average of this task and would have to modify the query. Let me know if a person will have multiple entries per a given task based on production data... If so, then THAT element would probably need to be put into its OWN pre-query where the "From AllUserTasks" table is.

Upvotes: 0

AndrewBay

Reputation: 938

I would do this in a sql script, doing this in reporting would be overkill (although it probably would be possible).

I have and example script right here:

drop table #tmp, #tmp2, #tmp3

select 'User A' as [User],' Task X ' as [Task],100.00 as [Time]
into #tmp
union all 
select 'User A ',' Task Y ',200
union all 
select 'User A ',' Task Z ',300
union all
select 'User B ',' Task X ',400
union all 
select 'User B ',' Task Y ',500
union all 
select 'User B ',' Task Z ',600
union all 
select 'User C ',' Task X ',700
union all 
select 'User C ',' Task Y ',800
union all 
select 'User C ',' Task Z ',900
union all 
select 'User D ',' Task X ',1000
union all 
select 'User D ',' Task Y ',1100
union all 
select 'User D ',' Task Z ',1200

select [User],
   Task,
   Sum(time) as time
into #tmp2
from #tmp
group by [User],
   [Task]

select [User],
   avg(time) as time
into #tmp3
from #tmp2
group by [User];

declare @statement nvarchar(max);
select @statement = 
'with cteTimes as (
   select * 
   from #tmp2 t
      pivot (sum (t.[time]) for Task in (' + stuff((select ', ' + quotename([Task]) from #tmp group by [Task] for xml path, type).value('.','varchar(max)'), 1, 2, '') + ')) as Task
)
select ''All'' as [User],
   (select avg(usr.time) from #tmp3 usr),'
 + stuff((select ', avg(' + quotename([Task]) + ') as ' + quotename([Task]) from #tmp group by [Task] for xml path, type).value('.','varchar(max)'), 1, 2, '') + 
 +'from cteTimes x ';

exec sp_executesql @statement;

The script can probably be optimized by using a pivot instead of multiple joins while creating the #tmp4. My example is just explanatory.

Upvotes: 0

Jason Horner

Reputation: 3690

Assuming your source is SQL Server 2008 you might be able to use a combination of grouping sets:

http://technet.microsoft.com/en-us/library/bb522495.aspx

And the SSRS Aggregate Function:

http://msdn.microsoft.com/en-us/library/ms155830(v=sql.90).aspx

This blog has an example that may also be helpful

http://beyondrelational.com/blogs/jason/archive/2010/07/03/aggregate-of-an-aggregate-function-in-ssrs.aspx

Good Luck

Upvotes: 0

Etch

Reputation: 3054

The only way I was able to do this type of functionality is to was to use the Code section of the report. I would keep track of the group data I wanted to summarize in a global variable in that I would later output to the field that I wanted.

Here is a microsoft article to describe how to embed code into your report http://msdn.microsoft.com/en-us/library/ms159238.aspx

Here is a much more detailed way to solve your problem. Link

Upvotes: 1

Summarize data that has already been grouped

Answers (4)

Related Questions