Reputation: 597
I am trying to move some of my slower processes in dplyr to using data.table, however can not seem to find an efficient way of using a "mutate_at" type approach in data.table. Especially, when it comes to naming the new variables created & applying more than 1 function to multiple columns.
Below I use mutate_at to apply 2 different functions to 2 different columns with associated naming + using a group by statement. I want to be able to replicate this easily in data.table.
library(tibble)
library(zoo)
Data = tibble(A = rep(c(1,2),50),
B = 1:100,
C = 101:200)
Data %>%
group_by(A) %>%
mutate_at(vars(B,C), funs(Roll.Mean.Week = 7 * rollapply(., width = 7, mean, align = "right", fill = 0, na.rm = T, partial = T),
Roll.Mean.Two.Week = 7 * rollapply(., width = 14, mean, align = "right", fill = 0, na.rm = T, partial = T))) %>%
ungroup()
Upvotes: 5
Views: 1627
Reputation: 25225
And also from Apply multiple functions to multiple columns in data.table
f <- function(x) lapply(c(Roll.Mean.Week=7L, Roll.Mean.Two.Week=14L),
function(n) rollapplyr(x, width=n, mean, fill=0L, na.rm=TRUE, partial=TRUE))
DT[, unlist(lapply(.SD, f), recursive=FALSE), A, .SDcols=B:C]
output:
A B._Roll.Mean.Week B._Roll.Mean.Two.Week C._Roll.Mean.Week C._Roll.Mean.Two.Week
1: 1 1 1 101 101
2: 1 2 2 102 102
3: 1 3 3 103 103
4: 1 4 4 104 104
5: 1 5 5 105 105
6: 1 6 6 106 106
7: 1 7 7 107 107
8: 1 9 8 109 108
9: 1 11 9 111 109
10: 1 13 10 113 110
[trunc...]
Upvotes: 1
Reputation: 887118
With data.table
, we can specify the columns of interest in .SDcols
, loop through the .SD
with lapply
and apply the function of interest. Here, the funcion rollapply
is repeated with only change in width
parameter. So, it may be better to create a function to avoid repeating the whole arguments. Also, while applying the function (f1
), the output can be kept in a list
, later unlist
with recursive = FALSE
and assign (:=
) to columns of interest
library(data.table)
library(zoo)
nm1 <- c("B", "C")
nm2 <- paste0(nm1, "_Roll.Mean.Week")
nm3 <- paste0(nm1, "_Roll.Mean.Two.Week")
f1 <- function(x, width) rollapply(x, width = width, mean,
align = "right", fill = 0, na.rm = TRUE, partial = TRUE)
setDT(Data)[, c(nm2, nm3) := unlist(lapply(.SD, function(x)
list(f1(x, 7), f1(x, 14))), recursive = FALSE), by = A, .SDcols = nm1]
head(Data)
# A B C B_Roll.Mean.Week C_Roll.Mean.Week B_Roll.Mean.Two.Week C_Roll.Mean.Two.Week
#1: 1 1 101 1 1 101 101
#2: 2 2 102 2 2 102 102
#3: 1 3 103 2 2 102 102
#4: 2 4 104 3 3 103 103
#5: 1 5 105 3 3 103 103
#6: 2 6 106 4 4 104 104
Note that funs
is deprecated in tidyverse
and in its place, can use list(~
or just ~
Data %>%
group_by(A) %>%
mutate_at(vars(B,C), list(Roll.Mean.Week = ~f1(., 7),
Roll.Mean.Two.Week = ~ f1(., 14)))%>%
ungroup()
Upvotes: 4