Reputation: 743
I have following function which runs 100s of times. This aggregation is the bottleneck in my code. Is it possible to make is faster with just using data.table or rewrite this function using rcpp?
logit.gr <- function(DT){
temp1 <- DT[, lapply(.SD, function(x) col1*sum(y*(x - sum(x*exp(col2))))), by = .(main_idx), .SDcols = c('col3','col4')]
return(-colSums(temp1[, c('col3','col4'), with = F]))
}
where DT is
DT <- data.table(main_idx = c(rep('A',4), rep('B', 5)), col1 = runif(9), col2 = -2+runif(9), col3 = 1+runif(9), col4 = 1+runif(9), y = runif(9))
Upvotes: 1
Views: 220
Reputation: 20095
I think away to optimize is:
sum
should be added in function used inlapply
itself. It will result in to only 1 row permain_idx
in resultantdata.table
.- chain of
[
operator should be used tosum
columnscol3
andcol4
.
library(data.table)
DT[, lapply(.SD, function(x) sum(col1*sum(y*(x - sum(x*exp(col2)))))),
by = .(main_idx), .SDcols = c('col3','col4')][
,.(col3 = -sum(col3), col4 = -sum(col4))]
#Result
# col3 col4
#0.7575290 0.2423651
Data:
DT <- data.table(main_idx = c(rep('A',4), rep('B', 5)),
col1 = runif(9), col2 = -2+runif(9),
col3 = 1+runif(9), col4 = 1+runif(9), y = runif(9))
Upvotes: 1