Chetan Arvind Patil
Chetan Arvind Patil

Reputation: 866

Optimizing Apply() In R

The goal of the below code is to perform recursive and iterative analysis on a data set that has 400 columns and 6000 rows. It takes, two columns at a time and performs analysis on it, before moving to all the possible combinations.

Small sub set of large data set being used:

  data1       data2       data3      data4
-0.710003   -0.714271   -0.709946   -0.713645
-0.710458   -0.715011   -0.710117   -0.714157
-0.71071    -0.714048   -0.710235   -0.713515
-0.710255   -0.713991   -0.709722   -0.71397
-0.710585   -0.714491   -0.710223   -0.713885
-0.710414   -0.714092   -0.710166   -0.71434
-0.711255   -0.714116   -0.70945    -0.714173
-0.71097    -0.714059   -0.70928    -0.714059
-0.710343   -0.714576   -0.709338   -0.713644

Code using apply():

# Function
analysisFunc <- function () {

    # Fetch next data to be compared
    nextColumn <<- currentColumn + 1

    while (nextColumn <= ncol(Data)){

        # Fetch the two columns on which to perform analysis
        c1 <- Data[, currentColumn]
        c2 <- Data[, nextColumn]

        # Create linear model
        linearModel <- lm(c1 ~ c2)

        # Capture model data from summary
        modelData <- summary(linearModel)

        # Residuals
        residualData <- t(t(modelData$residuals))

        # Keep on appending data
        linearData <<- cbind(linearData, residualData)

        # Fetch next column
        nextColumn <<- nextColumn + 1

    }

    # Increment the counter
    currentColumn <<- currentColumn + 1

}

# Apply on function
apply(Data, 2, function(x) analysisFunc ())

I thought instead of using loops, apply() will help me optimize the code. However, it seems to have no major effect. Run time is more than two hours.

Does anyone think, I am going wrong on how apply() has been used? Is having while() within apply() call not a good idea? Any other way I can improve this code?

This is first time I am working with functional programming. Please let me know your suggestion, thanks.

Upvotes: 2

Views: 227

Answers (1)

Parfait
Parfait

Reputation: 107652

Consider an expand.grid of column names and then using mapply the multiple input version of apply family where you pass two+ vectors/lists and run a function across each input elementwise. With this approach you avoid expanding vectors within looping and running an inner while loop:

Data

Data <- read.table(text="  data1       data2       data3      data4
-0.710003   -0.714271   -0.709946   -0.713645
-0.710458   -0.715011   -0.710117   -0.714157
-0.71071    -0.714048   -0.710235   -0.713515
-0.710255   -0.713991   -0.709722   -0.71397
-0.710585   -0.714491   -0.710223   -0.713885
-0.710414   -0.714092   -0.710166   -0.71434
-0.711255   -0.714116   -0.70945    -0.714173
-0.71097    -0.714059   -0.70928    -0.714059
-0.710343   -0.714576   -0.709338   -0.713644", header=TRUE)

Process

# Data frame of all combinations excluding same columns 
modelcols <- subset(expand.grid(c1=names(Data), c2=names(Data), 
                    stringsAsFactors = FALSE), c1!=c2)

# Function
analysisFunc <- function(x,y) {        
      # Fetch the two columns on which to perform analysis
      c1 <- Data[[x]]
      c2 <- Data[[y]]

      # Create linear model
      linearModel <- lm(c1 ~ c2)

      # Capture model data from summary
      modelData <- summary(linearModel)

      # Residuals
      residualData <- modelData$residuals
}

# Apply function to return matrix of residuals
linearData <- mapply(analysisFunc, modelcols$c1, modelcols$c2)
# re-naming matrix columns
colnames(linearData) <- paste(modelcols$c1, modelcols$c2, sep="_")

Output

    data2_data1   data3_data1   data4_data1   data1_data2   data3_data2   data4_data2
1  1.440828e-04  8.629813e-05  1.514109e-04  5.583917e-04 -0.0001205821  2.866488e-04
2 -6.949384e-04 -2.508770e-04 -2.487813e-04 -1.005367e-04 -0.0001263202 -2.145225e-04
3  2.132192e-04 -4.609125e-04  4.551430e-04 -8.715424e-05 -0.0004593840  4.133856e-04
4  3.692403e-04  2.182627e-04 -1.116648e-04  3.835538e-04  0.0000408864 -4.244855e-05
5 -2.025772e-04 -4.032600e-04  5.442655e-05 -8.423568e-05 -0.0003484501  4.986815e-05
6  2.336373e-04 -2.838073e-04 -4.425935e-04  1.967203e-04 -0.0003805576 -4.109706e-04
7  2.661145e-05  1.250425e-04 -6.893342e-05 -6.508936e-04  0.0003408023 -2.436194e-04
8  1.456357e-04  3.991303e-04 -2.496687e-05 -3.501856e-04  0.0004980726 -1.304535e-04
9 -2.349110e-04  5.701233e-04  2.359596e-04  1.343401e-04  0.0005555326  2.921120e-04
    data1_data3   data2_data3   data4_data3   data1_data4   data2_data4   data3_data4
1  5.121547e-04  4.313395e-05  2.829814e-04  4.232081e-04  1.795365e-05 -9.584175e-05
2 -1.649379e-06 -6.684696e-04 -2.349827e-04  1.975728e-04 -7.112598e-04 -3.014160e-04
3 -2.942277e-04  3.141257e-04  4.029018e-04 -3.420290e-04  2.382149e-04 -3.760631e-04
4  3.371847e-04  2.859362e-04 -3.420612e-05  3.168009e-04  3.048006e-04  1.062117e-04
5 -1.651011e-04 -1.308671e-04  3.332034e-05 -5.127719e-05 -1.969902e-04 -3.890484e-04
6  2.550032e-05  2.586674e-04 -4.196917e-04  3.235528e-04  2.115955e-04 -3.627735e-04
7 -5.692790e-04  1.157675e-04 -2.277195e-04 -5.922595e-04  1.840773e-04  3.645036e-04
8 -2.258187e-04  1.445371e-04 -1.077903e-04 -3.583290e-04  2.386756e-04  5.422018e-04
9  3.812360e-04 -3.628313e-04  3.051868e-04  8.276013e-05 -2.870674e-04  5.122258e-04

Upvotes: 6

Related Questions