R: Convert 3darray[i, j, ] to columns of df, fast and readable

Question

I'm working with 3-dimensional arrays and want to have slices along the third dimension for each position in the first two dimensions as columns in a data frame. I also want my code to be readable for people who dont use R regularly.

Looping over the first two dimensions is very readable but slow (30 secs for the example below), while the permute-flatten-shape-to-matrix approach is faster (14 secs) but not so readable.

Any suggestions for a nice solution?

Reproducible example here:

# Create data
    d1 <- 200
    d2 <- 100
    d3 <- 50
    data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))

# Idea 1: Loop
    df <- data.frame(var1 = rep(0, d3))
    i <- 1

    system.time(
    for (c in 1:d2) { 
        for(r in 1:d1){
          i <- i + 1
          df[[i]] <- data[r, c, ]
        }
    })

# Idea 2: Permute dimension of array first
    df2 <- data.frame(var1 = rep(0, d3))

    system.time({
    data.perm <- aperm(data, c(3, 1, 2))
    df2[, 2:(d1*d2 + 1)] <- matrix(c(data.perm), nrow = d3, ncol = d1*d2)}
    )

    identical(df, df2)

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

I would suggest a much more simple approach:

t(apply(data, 3, c))

I hope it suits your expectations of being fast and readable.

fast, as demonstrated in the timings below.
readable because it's a basic apply statement. All that is being done is using c to convert the matrix in each third dimension to a single vector in each third dimension, which then simplifies to a two-dimensional array. The result just needs to be transposed....

Here's your sample data:

set.seed(1)
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))

Here are a few functions to compare:

funam <- function() t(apply(data, 3, c))
funrl <- function() {
  myl <- vector("list", d3)
  i <- 1

  for (c in 1:d2) { 
    for(r in 1:d1){
      i <- i + 1
      myl[[i]] <- data[r, c, ]
    }
  }
  do.call(cbind, myl)
}

funop <- function() {
  df <- data.frame(var1 = rep(0, d3))
  i <- 1

  for (c in 1:d2) { 
    for(r in 1:d1){
      i <- i + 1
      df[[i]] <- data[r, c, ]
    }
  }
  df[-1]
}

Here are the results of the timing:

system.time(am <- funam())
#    user  system elapsed 
#   0.000   0.000   0.062 
system.time(rl <- funrl())
#    user  system elapsed 
#   3.980   0.000   1.375 
system.time(op <- funop())
#    user  system elapsed 
#  21.496   0.000  21.355

... and a comparison for equality:

all.equal(am, as.matrix(unname(op)), check.attributes = FALSE)
# [1] TRUE
all.equal(am, rl, check.attributes = FALSE)
# [1] TRUE

R: Convert 3darray[i, j, ] to columns of df, fast and readable

Reproducible example here:

Answers (2)

Related Questions