Reputation: 361
I'm working with 3-dimensional arrays and want to have slices along the third dimension for each position in the first two dimensions as columns in a data frame. I also want my code to be readable for people who dont use R regularly.
Looping over the first two dimensions is very readable but slow (30 secs for the example below), while the permute-flatten-shape-to-matrix approach is faster (14 secs) but not so readable.
Any suggestions for a nice solution?
# Create data
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
# Idea 1: Loop
df <- data.frame(var1 = rep(0, d3))
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
})
# Idea 2: Permute dimension of array first
df2 <- data.frame(var1 = rep(0, d3))
system.time({
data.perm <- aperm(data, c(3, 1, 2))
df2[, 2:(d1*d2 + 1)] <- matrix(c(data.perm), nrow = d3, ncol = d1*d2)}
)
identical(df, df2)
Upvotes: 1
Views: 46
Reputation: 193587
I would suggest a much more simple approach:
t(apply(data, 3, c))
I hope it suits your expectations of being fast and readable.
apply
statement. All that is being done is using c
to convert the matrix
in each third dimension to a single vector in each third dimension, which then simplifies to a two-dimensional array
. The result just needs to be t
ransposed....Here's your sample data:
set.seed(1)
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
Here are a few functions to compare:
funam <- function() t(apply(data, 3, c))
funrl <- function() {
myl <- vector("list", d3)
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
}
do.call(cbind, myl)
}
funop <- function() {
df <- data.frame(var1 = rep(0, d3))
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
}
df[-1]
}
Here are the results of the timing:
system.time(am <- funam())
# user system elapsed
# 0.000 0.000 0.062
system.time(rl <- funrl())
# user system elapsed
# 3.980 0.000 1.375
system.time(op <- funop())
# user system elapsed
# 21.496 0.000 21.355
... and a comparison for equality:
all.equal(am, as.matrix(unname(op)), check.attributes = FALSE)
# [1] TRUE
all.equal(am, rl, check.attributes = FALSE)
# [1] TRUE
Upvotes: 1
Reputation: 70643
Here's an idea. Recommended read would be The R Inferno by Patrick Burns (pun intended?).
myl <- vector("list", d3) # create an empty list
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
})
user system elapsed
1.8 0.0 1.8
# bind each list element into a matrix, column-wise
do.call("cbind", myl)[1:5, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3394909 0.1266012 -0.4240452 0.2277654 -2.04943585
[2,] 1.6788653 -2.9381127 0.5781967 -0.7248759 -0.19482647
[3,] -0.6002371 -0.3132874 1.0895175 -0.2766891 -0.02109013
[4,] 0.5215603 -0.2805730 -1.0325867 -1.5373842 -0.14034565
[5,] 0.6063638 1.6027835 0.5711185 0.5410889 -1.77109124
Upvotes: 1