deepAgrawal
deepAgrawal

Reputation: 743

Faster aggregate multiple columns

I have following function which runs 100s of times. This aggregation is the bottleneck in my code. Is it possible to make is faster with just using data.table or rewrite this function using rcpp?

  logit.gr <- function(DT){
    temp1 <- DT[, lapply(.SD, function(x) col1*sum(y*(x - sum(x*exp(col2))))), by = .(main_idx), .SDcols = c('col3','col4')]
    return(-colSums(temp1[, c('col3','col4'), with = F]))
  }

where DT is

DT <- data.table(main_idx = c(rep('A',4), rep('B', 5)), col1 = runif(9), col2 = -2+runif(9), col3 = 1+runif(9), col4 = 1+runif(9), y = runif(9))

Upvotes: 1

Views: 220

Answers (1)

MKR
MKR

Reputation: 20095

I think away to optimize is:

  1. sum should be added in function used in lapply itself. It will result in to only 1 row per main_idx in resultant data.table.
  2. chain of [ operator should be used to sum columns col3 and col4.
library(data.table)
DT[, lapply(.SD, function(x) sum(col1*sum(y*(x - sum(x*exp(col2)))))), 
   by = .(main_idx), .SDcols = c('col3','col4')][
         ,.(col3 = -sum(col3), col4 = -sum(col4))]
#Result
#     col3      col4 
#0.7575290 0.2423651 

Data:

DT <- data.table(main_idx = c(rep('A',4), rep('B', 5)), 
              col1 = runif(9), col2 = -2+runif(9), 
              col3 = 1+runif(9), col4 = 1+runif(9), y = runif(9))

Upvotes: 1

Related Questions