Reputation: 1170
I need to compute a monthly weighted average. The data frame looks like this:
Month Variable Weighting
460773 1998-06-01 11 153.00
337134 1998-06-01 9 0.96
473777 1998-06-01 10 264.00
358226 1998-06-01 6 0.52
414626 1998-06-01 10 34.00
341020 1998-05-01 9 1.64
453066 1998-05-01 5 26.00
183276 1998-05-01 8 0.51
403729 1998-05-01 6 123.00
203005 1998-05-01 11 0.89
When I use aggregate
e.g.,
Output <- aggregate(Variable ~ Month, df , mean )
Output
Month Variable
1 1998-05-01 7.8
2 1998-06-01 9.2
I get correct results, however, when I try to add weight to the aggregation e.g.,
Output <- aggregate(Variable ~ Month, df , FUN = weighted.mean, w = df$Weighting)
I get a different-vector-lenghts error:
Error in weighted.mean.default(X[[1L]], ...) :
'x' and 'w' must have the same length
Is there a way to remedy this situation?
Upvotes: 0
Views: 2181
Reputation: 2621
In case you don't have plyr
, dplyr
or data.table
installed and cannot install them due to some reasons, it is still possible to use aggregate
to compute monthly weighted average, all you need is to do the following trick,
df$row <- 1:nrow(df) #the trick
aggregate(row~Month, df, function(i) mean(df$Variable[i])) #mean
aggregate(row~Month, df, function(i) weighted.mean(df$Variable[i], df$Weighting[i])) #weighted mean
Here are the outputs:
Mean:
> aggregate(row~Month, df, function(i) mean(df$Variable[i]))
Month row
1 1998-05-01 7.8
2 1998-06-01 9.2
Weighted mean:
> aggregate(row~Month, df, function(i) weighted.mean(df$Variable[i], df$Weighting[i]))
Month row
1 1998-05-01 5.89733
2 1998-06-01 10.33142
Upvotes: 3
Reputation: 12569
With aggregate()
it is not possible, because your weight vector is not partitionated during aggregate()
. You can use by()
or split()
plus sapply()
or additional package data.table
or function ddply()
from package plyr
or functions from the package dplyr
example with split()
plus sapply()
:
sapply(split(df, df$Month), function(d) weighted.mean(d$Variable, w = d$Weighting))
result:
1998-05-01 1998-06-01
5.89733 10.33142
a variant with by()
by(df, df$Month, FUN=function(d) weighted.mean(d$Variable, w = d$Weighting)) # or
unclass(by(df, df$Month, FUN=function(d) weighted.mean(d$Variable, w = d$Weighting)))
with package plyr
library(plyr)
ddply(df, ~Month, summarize, weighted.mean(Variable, w=Weighting))
with data.table
library(data.table)
setDT(df)[, weighted.mean(Variable, w = Weighting), Month]
Upvotes: 3