Chetan Arvind Patil
Chetan Arvind Patil

Reputation: 866

R Programming - Parallel Processing of Loops

I have a data set with around 1000 columns/parameters and want to perform regression among each of these parameters. So, data in column 1 will be stacked against all other 999 parameters for linear regression and so on.

The nonoptimized version of this approach would be:

loop <- c(1:ncol(Data))
for ( column in loop ){

    # Fetch next data to be compared
    nextColumn <- column + 1

    # Fetch next column
    while ( nextColumn <= ncol(Data) ){   

       # Analysis logic

       # Increment the counter
       nextColumn <- nextColumn + 1

   }
}

Above code will work, but will take lot of time. To optimize, I want to use parallel processing in R. There are many different packages which can be useful in this case, for example parallel and doparallel as explained in this question.

However, there might be some overhead involved which as a new R programmer I might not be aware off. I am looking for suggestions from R experts on better way to write above code in R and whether any specific package can be useful.

Looking forward to suggestions, thanks.

Upvotes: 1

Views: 537

Answers (1)

CPak
CPak

Reputation: 13591

Use mapply like this:

X <- 1:(ncol(mtcars)-1)     # first through penultimate column
Y <- 2:ncol(mtcars)         # second through last column
mapply(function(x,y) sum(mtcars[,x],mtcars[,y]), X, Y)

Upvotes: 2

Related Questions