Reputation: 445
I'm looking to create a cumulative mean that averages over multiple groups with a lag. It is for predictive analysis so I want each row to be the cumulative mean of all the rows before it (not including its own value).
This is a follow on from Grouped moving average in r
I'm sure there is a way to do this with rollapply and ave, I have been achieving this result with various moving windows using the below(just not a cummean):
library(zoo)
roll <- function(x, n) {
if (length(x) <= n) NA
else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Goals, Player, FUN = function(x) roll(x, 2)),
AVG3 = ave(Goals, Player, FUN = function(x) roll(x, 3)))
Here is the desired output:
Player Goals **AVG**
S 5
S 2 5
S 7 3.5
O 3
O 9 3
O 6 6
O 3 6
S 7 4.66
O 1 5.25
S 7 5.25
S 3 5.6
Q 8
S 3 5.16
O 4 4.4
P 1
S 9 4.857
S 4 5.375
Z 6
S 3 5.22
O 8 4.33
S 3 5
O 4 4.857
O 1 4.75
S 9 4.81
S 4 5.16
O 6 4.33
J 6
and here is the code to recreate the initial table in r
Player <- c('S','S','S','O','O','O','O','S','O','S','S','O','S','O','O','S','S','O','S','O','S','O','O','S','S','O','J')
Goals <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1,9,4,6,3,8,3,4,1,9,4,6,6)
data.frame(Player, Goals)
Any help is much appreciated
Upvotes: 3
Views: 1751
Reputation: 269684
1) We can use cumsum
in the base of R. No packages are used.
cumroll <- function(x) {
x <- head(x, -1)
c(NA, cumsum(x) / seq_along(x))
}
transform(DF, AVG = ave(Goals, Player, FUN = cumroll))
2) This could also replace cumroll
. It puts NaN in the postitions that are NA with cumroll
:
cumroll2 <- function(x) (cumsum(x) - x) / (seq_along(x) - 1)
transform(DF, AVG = ave(Goals, Player, FUN = cumroll2))
3) If you did want to use rollapply
here note that cumsum
could be replaced with rollapplyr(x, seq_along(x), sum)
in either of the above.
4) We could alternately use rollapply
like this which like cumroll2
uses NaNs.
library(zoo)
cumroll3 <- function(x) {
if (length(x) == 1) NaN
else rollapply(x, lapply(seq_along(x) - 1, function(x) -seq_len(x)), mean)
}
transform(DF, AVG = ave(Goals, Player, FUN = cumroll3))
Upvotes: 5
Reputation: 15163
One option is to use data.table
for the grouping and the cummean
function from dplyr
:
require(data.table)
require(dplyr)
Player <- c('S','S','S','O','O','O','O','S','O','S','S','O','S','O','O','S','S','O','S','O','S','O','O','S','S','O','J')
Goals <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1,9,4,6,3,8,3,4,1,9,4,6,6)
df<-data.frame(Player, Goals)
dt<-data.table(df)
lcummean<-function(x){
head(c(NA,cummean(x)),-1)
}
dt[,ave:=lcummean(Goals),by=Player]
> dt
Player Goals ave
1: S 5 NA
2: S 2 5.000000
3: S 7 3.500000
4: O 3 NA
5: O 9 3.000000
6: O 6 6.000000
7: O 3 6.000000
8: S 7 4.666667
9: O 1 5.250000
10: S 7 5.250000
11: S 3 5.600000
12: O 8 4.400000
13: S 3 5.166667
14: O 4 5.000000
15: O 1 4.857143
16: S 9 4.857143
17: S 4 5.375000
18: O 6 4.375000
19: S 3 5.222222
20: O 8 4.555556
21: S 3 5.000000
22: O 4 4.900000
23: O 1 4.818182
24: S 9 4.818182
25: S 4 5.166667
26: O 6 4.500000
27: J 6 NA
Player Goals ave
If you don't mind warning messages, you can also just do this:
dt[,ave:=c(NA,cummean(Goals)),by=Player]
since the last element will be discarded, but you will get warning messages about it.
Upvotes: 4
Reputation: 83235
Using the cummean
function of dplyr:
library(dplyr)
df1 %>%
group_by(Player) %>%
mutate(mean_prev_goals = lag(cummean(Goals), n=1, default=0))
gives:
Source: local data frame [27 x 3]
Groups: Player [3]
Player Goals mean_prev_goals
(fctr) (dbl) (dbl)
1 S 5 0.000000
2 S 2 5.000000
3 S 7 3.500000
4 O 3 0.000000
5 O 9 3.000000
6 O 6 6.000000
7 O 3 6.000000
8 S 7 4.666667
9 O 1 5.250000
10 S 7 5.250000
.. ... ... ...
Upvotes: 3