Reputation: 117
I want to do lag
and cummean
for 100s of columns. I prefer to use a dataframe to add in the parameters to a function. I have tried lazy eval with dplyr
but it fails when doing a function with mapply
, with the columns of a dataframe as parameters. I could do base R but am afraid it might slow the speed especially at dataframes as big as 700 variables by 60,000 rows.
before data frame
date name team score1 score2 height
1/1/2001 Bill eagles 1 2 5
1/1/2001 George eagles 2 7 2
1/1/2001 Aaron eagles 1 2 4
1/2/2001 Bill eagles 1 2 5
1/2/2001 George eagles 2 4 2
1/2/2001 Aaron eagles 2 2 4
1/3/2001 Bill eagles 2 3 5
1/3/2001 George eagles 2 7 2
1/3/2001 Aaron eagles 1 2 4
after data frame
date name team score1 score2 height score1_avg height_average
1/1/2001 Bill eagles 1 2 5 NA NA
1/1/2001 George eagles 2 7 2 NA NA
1/1/2001 Aaron eagles 1 2 4 NA NA
1/2/2001 Bill eagles 1 2 5 1.33 3.66
1/2/2001 George eagles 2 4 2 1.33 3.66
1/2/2001 Aaron eagles 2 2 4 1.33 3.66
1/3/2001 Bill eagles 2 3 5 1.5 3.66
1/3/2001 George eagles 2 7 2 1.5 3.66
1/3/2001 Aaron eagles 1 2 4 1.5 3.66
Here is what I was doing for one column but I need it scalable for 100s
df %>%
group_by(team) %>%
mutate(score1_avg = lag(cummean((score1))))
Upvotes: 0
Views: 44
Reputation: 887511
We can use data.table
by assigning (:=
) in place without copying
library(data.table)
setDT(df)[, paste0(names(df)[4:6], "avg") := lapply(.SD, function(x)
shift(cummean(x))[[1]]), team, .SDcols = score1:height]
Upvotes: 1