Henry
Henry

Reputation: 1686

Weighted average in R based on a few rows

I have a data frame like...

date, name, area, value
01/01/15, brian, 20, 3
01/01/15, james, 30, 6
02/01/15, brian, 30, 4
02/01/15, james, 30, 5

And I'd like to calculate a weighted average from this on a day-by-day basis.

In this simplified example, I would want (20*3+30*6)/(20+30) = 01/01/15 weighted average and (30*4+30*5)/(30+30) = 02/01/15 weighted average etc.

The number of rows that contribute to a day's weighted average are not known a priori and can be dynamic. There are several values that I would want the weighted average for, but of course if I have a single working example it should be easy to generalise

Thank you!

Upvotes: 2

Views: 1319

Answers (3)

mattdevlin
mattdevlin

Reputation: 1095

You could use the by function

by(df, df$date, function(grp) weighted.mean(grp$value, grp$area), simplify=FALSE)

This returns a list (simplify=FALSE) with the dates as the names and weighted means as the values.

You can then use unlist if you want a named vector or data.frame(dates=names(var), means=unlist(var)) for a dataframe.

Upvotes: 0

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

You can do:

library(data.table)

setDT(df)[,.(waverage=sum(area*value)/sum(area)), date]

#       date waverage
#1: 01/01/15      4.8
#2: 02/01/15      4.5

Or using weighted.mean:

setDT(df)[,.(waverage=weighted.mean(value,area)), date]

Base R you can use ave:

with(df, ave(value*area, date, FUN=sum)/ave(area, date, FUN=sum))

Upvotes: 3

lukeA
lukeA

Reputation: 54237

There are many ways to do this, e.g.

stack(lapply(split(df[, 3:4], df$date), function(df) with(df, weighted.mean(x = value, w = area))))
# values      ind
# 1    4.8 01/01/15
# 2    4.5 02/01/15

Upvotes: 0

Related Questions