Reputation: 866
I have a data set with around 1000 columns/parameters and want to perform regression among each of these parameters. So, data in column 1 will be stacked against all other 999 parameters for linear regression and so on.
The nonoptimized version of this approach would be:
loop <- c(1:ncol(Data))
for ( column in loop ){
# Fetch next data to be compared
nextColumn <- column + 1
# Fetch next column
while ( nextColumn <= ncol(Data) ){
# Analysis logic
# Increment the counter
nextColumn <- nextColumn + 1
}
}
Above code will work, but will take lot of time. To optimize, I want to use parallel processing in R. There are many different packages which can be useful in this case, for example parallel
and doparallel
as explained in this question.
However, there might be some overhead involved which as a new R programmer I might not be aware off. I am looking for suggestions from R experts on better way to write above code in R and whether any specific package can be useful.
Looking forward to suggestions, thanks.
Upvotes: 1
Views: 537
Reputation: 13591
Use mapply
like this:
X <- 1:(ncol(mtcars)-1) # first through penultimate column
Y <- 2:ncol(mtcars) # second through last column
mapply(function(x,y) sum(mtcars[,x],mtcars[,y]), X, Y)
Upvotes: 2