Zlo
Zlo

Reputation: 1170

Using aggregate to compute monthly weighted average

I need to compute a monthly weighted average. The data frame looks like this:

            Month Variable Weighting
460773 1998-06-01       11    153.00
337134 1998-06-01        9      0.96
473777 1998-06-01       10    264.00
358226 1998-06-01        6      0.52
414626 1998-06-01       10     34.00
341020 1998-05-01        9      1.64
453066 1998-05-01        5     26.00
183276 1998-05-01        8      0.51
403729 1998-05-01        6    123.00
203005 1998-05-01       11      0.89

When I use aggregate e.g.,

 Output <- aggregate(Variable ~ Month, df , mean )
 Output
       Month Variable
1 1998-05-01      7.8
2 1998-06-01      9.2

I get correct results, however, when I try to add weight to the aggregation e.g.,

Output <- aggregate(Variable ~ Month, df , FUN = weighted.mean, w = df$Weighting)

I get a different-vector-lenghts error:

Error in weighted.mean.default(X[[1L]], ...) : 
'x' and 'w' must have the same length

Is there a way to remedy this situation?

Upvotes: 0

Views: 2181

Answers (2)

pe-perry
pe-perry

Reputation: 2621

In case you don't have plyr, dplyr or data.table installed and cannot install them due to some reasons, it is still possible to use aggregate to compute monthly weighted average, all you need is to do the following trick,

df$row <- 1:nrow(df) #the trick
aggregate(row~Month, df, function(i) mean(df$Variable[i])) #mean
aggregate(row~Month, df, function(i) weighted.mean(df$Variable[i], df$Weighting[i])) #weighted mean

Here are the outputs:

Mean:

> aggregate(row~Month, df, function(i) mean(df$Variable[i]))
       Month row
1 1998-05-01 7.8
2 1998-06-01 9.2

Weighted mean:

> aggregate(row~Month, df, function(i) weighted.mean(df$Variable[i], df$Weighting[i]))
       Month      row
1 1998-05-01  5.89733
2 1998-06-01 10.33142

Upvotes: 3

jogo
jogo

Reputation: 12569

With aggregate() it is not possible, because your weight vector is not partitionated during aggregate(). You can use by() or split() plus sapply() or additional package data.table or function ddply() from package plyr or functions from the package dplyr

example with split() plus sapply():

sapply(split(df, df$Month), function(d) weighted.mean(d$Variable, w = d$Weighting))

result:

1998-05-01 1998-06-01 
   5.89733   10.33142 

a variant with by()

by(df, df$Month, FUN=function(d) weighted.mean(d$Variable, w = d$Weighting)) # or
unclass(by(df, df$Month, FUN=function(d) weighted.mean(d$Variable, w = d$Weighting)))

with package plyr

library(plyr)
ddply(df, ~Month, summarize, weighted.mean(Variable, w=Weighting))

with data.table

library(data.table)
setDT(df)[, weighted.mean(Variable, w = Weighting), Month]

Upvotes: 3

Related Questions