How to speed up row operations over large dataset when applying specific function

Question

I am working with a big dataframe in R. My dataframe is Q, which I include a similar structure in my code. It has 250.000 rows and 1000 columns. My goal is to apply for each row a time series model in order to obtain the coefficients from each model. In my case, I will use auto.arima function from forecast package. I have tried two ways to solve my problem, which I include next:

library(forecast)
set.seed(123)
Q <- as.data.frame(matrix(rnorm(250000*1000),nrow = 250000,ncol = 1000,byrow = T))
#Approach 1
models <- apply(Q, 1, auto.arima)
#Extract coeffs
coeffs <- lapply(models, function(x) x$coef)

#Approach 2
#Create a list and save coeff using a loop
tlist <- list(0)
for(i in 1:dim(Q)[1])
{
  models <- apply(Q[i,], 1, auto.arima)
  coeffs <- as.data.frame(lapply(models, function(x) as.data.frame(t(x$coef))))
  tlist[[i]] <- coeffs
  gc()
}

In Approach 1, I used apply() function to create a list in order to save the models, and therefore I used lapply() to extract the coefficients. The issue with this approach is that it took 60 hours but it did not finished.

In approach 2, it is a classic loop in order to apply the function for each row and then save the results on a list. The situation was the same, 30 hours but it did not finished.

In both cases the task was not completed, ending up with my computer collapsed. I do not know how to solve this time issue because it looks like my solutions are very slow. My computer has 8GB ram and Windows 64 bit system. I would like to make this operation by row faster. It would be great if I can add the results from coefficients directly to Q but if it is not possible, a list with the results would be fantastic. Q is a dataframe but it can be also a datatable.

Is there any way to boost my code in order to obtain my results? Many thanks for your help.

How to speed up row operations over large dataset when applying specific function

Answers (1)

Related Questions