Reputation: 866
The goal of the below code is to perform recursive and iterative analysis on a data set that has 400 columns and 6000 rows. It takes, two columns at a time and performs analysis on it, before moving to all the possible combinations.
Small sub set of large data set being used:
data1 data2 data3 data4
-0.710003 -0.714271 -0.709946 -0.713645
-0.710458 -0.715011 -0.710117 -0.714157
-0.71071 -0.714048 -0.710235 -0.713515
-0.710255 -0.713991 -0.709722 -0.71397
-0.710585 -0.714491 -0.710223 -0.713885
-0.710414 -0.714092 -0.710166 -0.71434
-0.711255 -0.714116 -0.70945 -0.714173
-0.71097 -0.714059 -0.70928 -0.714059
-0.710343 -0.714576 -0.709338 -0.713644
Code using apply()
:
# Function
analysisFunc <- function () {
# Fetch next data to be compared
nextColumn <<- currentColumn + 1
while (nextColumn <= ncol(Data)){
# Fetch the two columns on which to perform analysis
c1 <- Data[, currentColumn]
c2 <- Data[, nextColumn]
# Create linear model
linearModel <- lm(c1 ~ c2)
# Capture model data from summary
modelData <- summary(linearModel)
# Residuals
residualData <- t(t(modelData$residuals))
# Keep on appending data
linearData <<- cbind(linearData, residualData)
# Fetch next column
nextColumn <<- nextColumn + 1
}
# Increment the counter
currentColumn <<- currentColumn + 1
}
# Apply on function
apply(Data, 2, function(x) analysisFunc ())
I thought instead of using loops, apply()
will help me optimize the code. However, it seems to have no major effect. Run time is more than two hours.
Does anyone think, I am going wrong on how apply()
has been used? Is having while()
within apply()
call not a good idea? Any other way I can improve this code?
This is first time I am working with functional programming. Please let me know your suggestion, thanks.
Upvotes: 2
Views: 227
Reputation: 107652
Consider an expand.grid
of column names and then using mapply
the multiple input version of apply family where you pass two+ vectors/lists and run a function across each input elementwise. With this approach you avoid expanding vectors within looping and running an inner while
loop:
Data
Data <- read.table(text=" data1 data2 data3 data4
-0.710003 -0.714271 -0.709946 -0.713645
-0.710458 -0.715011 -0.710117 -0.714157
-0.71071 -0.714048 -0.710235 -0.713515
-0.710255 -0.713991 -0.709722 -0.71397
-0.710585 -0.714491 -0.710223 -0.713885
-0.710414 -0.714092 -0.710166 -0.71434
-0.711255 -0.714116 -0.70945 -0.714173
-0.71097 -0.714059 -0.70928 -0.714059
-0.710343 -0.714576 -0.709338 -0.713644", header=TRUE)
Process
# Data frame of all combinations excluding same columns
modelcols <- subset(expand.grid(c1=names(Data), c2=names(Data),
stringsAsFactors = FALSE), c1!=c2)
# Function
analysisFunc <- function(x,y) {
# Fetch the two columns on which to perform analysis
c1 <- Data[[x]]
c2 <- Data[[y]]
# Create linear model
linearModel <- lm(c1 ~ c2)
# Capture model data from summary
modelData <- summary(linearModel)
# Residuals
residualData <- modelData$residuals
}
# Apply function to return matrix of residuals
linearData <- mapply(analysisFunc, modelcols$c1, modelcols$c2)
# re-naming matrix columns
colnames(linearData) <- paste(modelcols$c1, modelcols$c2, sep="_")
Output
data2_data1 data3_data1 data4_data1 data1_data2 data3_data2 data4_data2
1 1.440828e-04 8.629813e-05 1.514109e-04 5.583917e-04 -0.0001205821 2.866488e-04
2 -6.949384e-04 -2.508770e-04 -2.487813e-04 -1.005367e-04 -0.0001263202 -2.145225e-04
3 2.132192e-04 -4.609125e-04 4.551430e-04 -8.715424e-05 -0.0004593840 4.133856e-04
4 3.692403e-04 2.182627e-04 -1.116648e-04 3.835538e-04 0.0000408864 -4.244855e-05
5 -2.025772e-04 -4.032600e-04 5.442655e-05 -8.423568e-05 -0.0003484501 4.986815e-05
6 2.336373e-04 -2.838073e-04 -4.425935e-04 1.967203e-04 -0.0003805576 -4.109706e-04
7 2.661145e-05 1.250425e-04 -6.893342e-05 -6.508936e-04 0.0003408023 -2.436194e-04
8 1.456357e-04 3.991303e-04 -2.496687e-05 -3.501856e-04 0.0004980726 -1.304535e-04
9 -2.349110e-04 5.701233e-04 2.359596e-04 1.343401e-04 0.0005555326 2.921120e-04
data1_data3 data2_data3 data4_data3 data1_data4 data2_data4 data3_data4
1 5.121547e-04 4.313395e-05 2.829814e-04 4.232081e-04 1.795365e-05 -9.584175e-05
2 -1.649379e-06 -6.684696e-04 -2.349827e-04 1.975728e-04 -7.112598e-04 -3.014160e-04
3 -2.942277e-04 3.141257e-04 4.029018e-04 -3.420290e-04 2.382149e-04 -3.760631e-04
4 3.371847e-04 2.859362e-04 -3.420612e-05 3.168009e-04 3.048006e-04 1.062117e-04
5 -1.651011e-04 -1.308671e-04 3.332034e-05 -5.127719e-05 -1.969902e-04 -3.890484e-04
6 2.550032e-05 2.586674e-04 -4.196917e-04 3.235528e-04 2.115955e-04 -3.627735e-04
7 -5.692790e-04 1.157675e-04 -2.277195e-04 -5.922595e-04 1.840773e-04 3.645036e-04
8 -2.258187e-04 1.445371e-04 -1.077903e-04 -3.583290e-04 2.386756e-04 5.422018e-04
9 3.812360e-04 -3.628313e-04 3.051868e-04 8.276013e-05 -2.870674e-04 5.122258e-04
Upvotes: 6