Monal
Monal

Reputation: 127

Weighted mean with by function

Trying to get weighted mean for a couple of categories want to use by(df$A,df$B,function(x) weighted.mean(x,df$C)) This doesn't work of course. Is there a way to do this using by() and weighted.mean()

 df= data.frame(A=c(1,4,56,4,3),B=c('hi','gb','hi','gb','yo'),C=c(5,2,4,1,3))

 by(df$A,df$B,function(x) weighted.mean(x,df$C)) #doesn't work

I have a bunch of work arounds but it would so simple if I could just use that format.

Upvotes: 2

Views: 204

Answers (3)

Stephan Kolassa
Stephan Kolassa

Reputation: 8267

Or simply recreate the calculation used by weighted.mean():

by(df,df$B,function(df)with(df,sum(A*C)/sum(C)))

df$B: gb
[1] 4
------------------------------------------------------------ 
df$B: hi
[1] 25.44444
------------------------------------------------------------ 
df$B: yo
[1] 3

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92282

Here's a simple and efficient solution using data.table

library(data.table)
setDT(df)[, .(WM = weighted.mean(A, C)), B]
#     B       WM
# 1: hi 25.44444
# 2: gb  4.00000
# 3: yo  3.00000

Or using split and apply combination from base R

sapply(split(df, df$B), function(x) weighted.mean(x$A, x$C))
#      gb       hi       yo 
# 4.00000 25.44444  3.00000 

Or

library(dplyr)
df %>%
  group_by(B) %>%
  summarise(WM = weighted.mean(A, C))
# Source: local data frame [3 x 2]
# 
# B       WM
# 1 gb  4.00000
# 2 hi 25.44444
# 3 yo  3.00000

Upvotes: 4

konvas
konvas

Reputation: 14346

You need to pass the weights along with the values to be averaged in by():

by(df[c("A","C")], df$B, function(x) weighted.mean(x$A, x$C))
# df$B: gb
# [1] 4
# ------------------------------------------------------------ 
# df$B: hi
# [1] 25.44444
# ------------------------------------------------------------ 
# df$B: yo
# [1] 3

Upvotes: 4

Related Questions