skeletor
skeletor

Reputation: 361

R: Convert 3darray[i, j, ] to columns of df, fast and readable

I'm working with 3-dimensional arrays and want to have slices along the third dimension for each position in the first two dimensions as columns in a data frame. I also want my code to be readable for people who dont use R regularly.

Looping over the first two dimensions is very readable but slow (30 secs for the example below), while the permute-flatten-shape-to-matrix approach is faster (14 secs) but not so readable.

Any suggestions for a nice solution?

Reproducible example here:

# Create data
    d1 <- 200
    d2 <- 100
    d3 <- 50
    data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))

# Idea 1: Loop
    df <- data.frame(var1 = rep(0, d3))
    i <- 1

    system.time(
    for (c in 1:d2) { 
        for(r in 1:d1){
          i <- i + 1
          df[[i]] <- data[r, c, ]
        }
    })

# Idea 2: Permute dimension of array first
    df2 <- data.frame(var1 = rep(0, d3))

    system.time({
    data.perm <- aperm(data, c(3, 1, 2))
    df2[, 2:(d1*d2 + 1)] <- matrix(c(data.perm), nrow = d3, ncol = d1*d2)}
    )

    identical(df, df2)

Upvotes: 1

Views: 46

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193587

I would suggest a much more simple approach:

t(apply(data, 3, c))

I hope it suits your expectations of being fast and readable.

  • fast, as demonstrated in the timings below.
  • readable because it's a basic apply statement. All that is being done is using c to convert the matrix in each third dimension to a single vector in each third dimension, which then simplifies to a two-dimensional array. The result just needs to be transposed....

Here's your sample data:

set.seed(1)
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))

Here are a few functions to compare:

funam <- function() t(apply(data, 3, c))
funrl <- function() {
  myl <- vector("list", d3)
  i <- 1

  for (c in 1:d2) { 
    for(r in 1:d1){
      i <- i + 1
      myl[[i]] <- data[r, c, ]
    }
  }
  do.call(cbind, myl)
}

funop <- function() {
  df <- data.frame(var1 = rep(0, d3))
  i <- 1

  for (c in 1:d2) { 
    for(r in 1:d1){
      i <- i + 1
      df[[i]] <- data[r, c, ]
    }
  }
  df[-1]
}

Here are the results of the timing:

system.time(am <- funam())
#    user  system elapsed 
#   0.000   0.000   0.062 
system.time(rl <- funrl())
#    user  system elapsed 
#   3.980   0.000   1.375 
system.time(op <- funop())
#    user  system elapsed 
#  21.496   0.000  21.355 

... and a comparison for equality:

all.equal(am, as.matrix(unname(op)), check.attributes = FALSE)
# [1] TRUE
all.equal(am, rl, check.attributes = FALSE)
# [1] TRUE

Upvotes: 1

Roman Luštrik
Roman Luštrik

Reputation: 70643

Here's an idea. Recommended read would be The R Inferno by Patrick Burns (pun intended?).

myl <- vector("list", d3) #  create an empty list
i <- 1

system.time(
  for (c in 1:d2) { 
    for(r in 1:d1){
      i <- i + 1
      myl[[i]] <- data[r, c, ]
    }
  })

user  system elapsed 
 1.8     0.0     1.8 

# bind each list element into a matrix, column-wise
do.call("cbind", myl)[1:5, 1:5]

           [,1]       [,2]       [,3]       [,4]        [,5]
[1,] -0.3394909  0.1266012 -0.4240452  0.2277654 -2.04943585
[2,]  1.6788653 -2.9381127  0.5781967 -0.7248759 -0.19482647
[3,] -0.6002371 -0.3132874  1.0895175 -0.2766891 -0.02109013
[4,]  0.5215603 -0.2805730 -1.0325867 -1.5373842 -0.14034565
[5,]  0.6063638  1.6027835  0.5711185  0.5410889 -1.77109124

Upvotes: 1

Related Questions