runningbirds
runningbirds

Reputation: 6615

grouped weighted mean in r (preferably with ddply but whatever works)

I have a dataset where I want to take both the normal 'mean' and the weighted mean by group. Each group can be considered a different portfolio or stock, price is the price of that portfolio or stock, size is the shares, gain is the return in percentage, and so market cap would be price*size.

Weighted mean would be the gain relative to the market cap by group. I run the code below and it obviously looks wrong but for the life of me I can't figure out what I am missing:

mydf= structure(list(group = structure(c(1L, 2L, 1L, 2L, 1L), .Label = c("a","b"), class = "factor"), 
                     price = c(15, 20, 10, 40, 20), size = c(100, 10, 50, 50, 1000), 
                     gain = c(0.03, 0.02, 0.05, 0.1, 0.01), wt = c(1500, 200, 500, 2000, 20000)), 
                .Names = c("group", "price", "size", "gain", "wt"), row.names = c(NA, -5L), 
                class = "data.frame")
mydf
library(plyr)
ddply(x, .(group), summarise,normal_mean= mean(gain), 
      wt_mean= weighted.mean(x$price*x$size,x$gain))

here I multiplied price and size together, you could also just use wt column in the weighted.mean function or supposedly...

weighted mean does not seem to be calculating over my groups but across all rows. any help?

Upvotes: 0

Views: 235

Answers (2)

rmuc8
rmuc8

Reputation: 2989

An approach with dplyr

mydf %>% group_by(group) %>%
summarise (mean=mean(gain), avgwt = weighted.mean(gain,wt))


  group mean      avgwt
1     a 0.03 0.01227273
2     b 0.06 0.09272727

Upvotes: 1

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

An approach with data.table

library(data.table)

setDT(mydf)[,list(normalMean=mean(gain), 
             weightedMean=weighted.mean(gain, wt/sum(wt))), 
             by = group]

#   group normalMean weightedMean
#1:     a       0.03   0.01227273
#2:     b       0.06   0.09272727

Upvotes: 2

Related Questions