ProfK
ProfK

Reputation: 51064

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):

(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)

The first part reflects the same functionality as Excel's SUMPRODUCT() function.

My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.

EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:

select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Weighty'
) baseSeries inner join (       
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Tons Milled'   
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate

Upvotes: 17

Views: 37361

Answers (3)

MatBailie
MatBailie

Reputation: 86715

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...

SELECT
   SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
   [tbl]
WHERE
   [tbl].date >= '2009 Jan 01'
   AND [tbl].date < '2010 Jan 01'

The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string

Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)

EDIT:

In your lastest example you highlight three fields:

  • RecordDate
  • KPI
  • Actual

When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.


Some questions I have are:

  • Are there any other fields?
  • Is there only ever ONE actual per date per KPI?

The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)

Regardless, a slight simplification of your query is certainly possible...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
   CalcProductionRecords AS [baseSeries]
INNER JOIN
   CalcProductionRecords AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate
--    AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
   [baseSeries].KPI = 'Tons Milled'
   AND [weightSeries].KPI = 'Weighty'

The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.


If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
    SELECT
        RecordDate,
        SUM(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Tons Milled'
    GROUP BY
        RecordDate
)
   AS [baseSeries]
INNER JOIN
(
    SELECT
        RecordDate,
        AVG(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Weighty'
    GROUP BY
        RecordDate
)
   AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate

This assumes the AVG of the weight is valid if there are multiple weights for the same day.


EDIT : Someone just voted for this so I thought I'd improve the final answer :)

SELECT
   SUM(Actual * Weight) / SUM(Weight)
FROM
(
    SELECT
        RecordDate,
        SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END)   AS Actual,
        AVG(CASE WHEN KPI = 'Weighty'     THEN Actual ELSE NULL END)   AS Weight
    FROM
        CalcProductionRecords
    WHERE
        KPI IN ('Tons Milled', 'Weighty')
    GROUP BY
        RecordDate
)
   AS pivotAggregate

This avoids the JOIN and also only scans the table once.

It relies on the fact that NULL values are ignored when calculating the AVG().

Upvotes: 20

priyanka.sarkar
priyanka.sarkar

Reputation: 26498

If I have understand the problem then try this

SET DATEFORMAT dmy
    declare @tbl table(A int, B int,recorddate datetime,KPI varchar(50))
    insert into @tbl 
        select 1,10 ,'21/01/2009', 'Weighty'union all 
        select 2,20,'10/01/2009', 'Tons Milled' union all
        select 3,30 ,'03/02/2009', 'xyz'union all 
        select 4,40 ,'10/01/2009', 'Weighty'union all
        select 5,50 ,'05/01/2009', 'Tons Milled'union all 
        select 6,60,'04/01/2009', 'abc' union all
        select 7,70 ,'05/01/2009', 'Weighty'union all 
        select 8,80,'09/01/2009', 'xyz' union all
        select 9,90 ,'05/01/2009', 'kws'    union all 
        select 10,100,'05/01/2009', 'Tons Milled'

    select SUM(t1.A*t2.A)/SUM(t2.A)Result  from  
                   (select RecordDate,A,B,KPI from @tbl)t1 
        inner join(select RecordDate,A,B,KPI from @tbl t)t2
        on t1.RecordDate = t2.RecordDate
        and t1.KPI = t2.KPI

Upvotes: 1

Quassnoi
Quassnoi

Reputation: 425371

SELECT  SUM(A * B) / SUM(A)
FROM    mytable

Upvotes: 12

Related Questions