user6452857
user6452857

Reputation: 117

What is the best practice for creating a 100s of new columns to a dataframe at the highest speed and least amount of code?

I want to do lag and cummean for 100s of columns. I prefer to use a dataframe to add in the parameters to a function. I have tried lazy eval with dplyr but it fails when doing a function with mapply, with the columns of a dataframe as parameters. I could do base R but am afraid it might slow the speed especially at dataframes as big as 700 variables by 60,000 rows.

before data frame

date        name    team     score1 score2 height
1/1/2001     Bill   eagles     1      2     5
1/1/2001    George  eagles     2      7     2
1/1/2001    Aaron   eagles     1      2     4
1/2/2001     Bill   eagles     1      2     5
1/2/2001    George  eagles     2      4     2
1/2/2001    Aaron   eagles     2      2     4
1/3/2001     Bill   eagles     2      3     5
1/3/2001    George  eagles     2      7     2
1/3/2001    Aaron   eagles     1      2     4

after data frame

date        name    team     score1 score2 height  score1_avg height_average
1/1/2001     Bill   eagles     1      2     5          NA           NA 
1/1/2001    George  eagles     2      7     2          NA           NA 
1/1/2001    Aaron   eagles     1      2     4          NA           NA 
1/2/2001     Bill   eagles     1      2     5          1.33         3.66
1/2/2001    George  eagles     2      4     2          1.33         3.66
1/2/2001    Aaron   eagles     2      2     4          1.33         3.66
1/3/2001     Bill   eagles     2      3     5          1.5          3.66 
1/3/2001    George  eagles     2      7     2          1.5          3.66 
1/3/2001    Aaron   eagles     1      2     4          1.5          3.66 

Here is what I was doing for one column but I need it scalable for 100s

 df %>%
  group_by(team) %>%
  mutate(score1_avg = lag(cummean((score1))))

Upvotes: 0

Views: 44

Answers (1)

akrun
akrun

Reputation: 887511

We can use data.table by assigning (:=) in place without copying

library(data.table)
setDT(df)[, paste0(names(df)[4:6], "avg") := lapply(.SD, function(x) 
              shift(cummean(x))[[1]]), team, .SDcols = score1:height]

Upvotes: 1

Related Questions