Reputation: 5139
Suppose I have this data.frame in R:
ages <- data.frame(Indiv = numeric(),
Age = numeric(),
W = numeric())
ages[1,] <- c(1,10,2)
ages[2,] <- c(1,15,5)
ages[3,] <- c(2,5,1)
ages[4,] <- c(2,100,2)
ages
Indiv Age W
1 1 10 2
2 1 15 5
3 2 5 1
4 2 100 2
If I do:
meanAge <- aggregate(ages$Age,list(ages$Indiv),mean)
I get the mean Age (x) for each Indiv (Group.1):
Group.1 x
1 1 12.5
2 2 52.5
But I want to calculate the weighted arithmetic mean of Age (weight being W). If I do:
WmeanAge <- aggregate(ages$Age,list(ages$Indiv),weighted.mean,ages$W)
I get:
Error in weighted.mean.default(X[[1L]], ...) :
'x' and 'w' must have the same length
I think I should have:
Group.1 x
1 1 13.57142857
2 2 68.33333333
What am I doing wrong? Thanks in advance!
Upvotes: 2
Views: 7966
Reputation: 2417
Your number of weight values do not match your number of groups and so aggregate cannot collapse the groups properly. Here is a very inelegant solution using a for loop.
ages = data.frame(Indiv=c(1,1,2,2),Age=c(10,15,5,100),W=c(2,5,1,2))
age.Indiv <- vector()
for(i in unique(ages$Indiv)){
age.Indiv <- append(age.Indiv, weighted.mean( ages[ages$Indiv == i ,]$Age,
ages[ages$Indiv == i ,]$W))
}
names(age.Indiv) <- unique(ages$Indiv)
age.Indiv
Upvotes: 1
Reputation: 4807
The problem is that aggregate
does not split up the w
arguments – so weighted.mean is receiving subsets of ages$Age
, but it is not receiving the equivalent subsets of ages$W
.
Try the plyr
package!! It's great. I use it in 95% of the scripts that I write.
library("plyr")
# the plyr package has functions that come in the format of _ _ ply
# the first blank is the input format, and the second is the output format
# d = data.frame, l = list, a = array, etc.
# thus, with ddply(), you supply a data.frame (ages), and it returns a data.frame (WmeanAge)
# .data is your data set
# .variables is the name of the column (or columns!) to be used to split .data
# .fun is the function you want to apply to each subset of .data
new.weighted.mean <- function(x, ...){
weighted.mean(x=x[,"Age"], w=x[,"W"], ...)
}
WmeanAge <- ddply(.data=ages, .variables="Indiv", .fun=new.weighted.mean, na.rm=TRUE)
print(WmeanAge)
Upvotes: 2
Reputation: 206546
If you want to use base functions, here's one possibility
as.vector(by(ages[c("Age","W")],
list(ages$Indiv),
function(x) {
do.call(weighted.mean, unname(x))
}
))
Since aggregate won't subset multiple columns, i user the more general by
and simplified the result to a vector.
Upvotes: 2
Reputation: 618
Doh, you beat me to it. But anyway, here is my answer using both plyr
and dplyr
:
ages = data.frame(Indiv = c(1,1,2,2),
Age = c(10,15,5,100),
W = c(2,5,1,2))
library(plyr)
ddply(ages, .(Indiv), summarize,
mean = mean(Age),
wmean = weighted.mean(Age, w=W))
library(dplyr)
ages %.%
group_by(Indiv) %.%
summarise(mean = mean(Age), wmean = weighted.mean(Age, W))
Upvotes: 11