Why does which work faster on a data frame column compared to a matrix column?

Question

I have the following data:

height = 1:10000000
length = -(1:10000000)
body_dim = data.frame(height,length)
body_dim_mat = as.matrix(body_dim)

Why does which() work faster for the data frame compared to the matrix?

> microbenchmark(body_dim[which(body_dim$height==50000),"length"])
Unit: milliseconds
                                                expr      min       lq   median       uq      max neval
 body_dim[which(body_dim$height == 50000), "length"] 124.4586 125.1625 125.9281 127.9496 284.9824   100

> microbenchmark(body_dim_mat[which(body_dim_mat[,1] == 50000),2])
Unit: milliseconds
                                               expr      min       lq   median      uq     max neval
 body_dim_mat[which(body_dim_mat[, 1] == 50000), 2] 251.1282 252.4457 389.7251 400.313 1004.25   100

Roland · Accepted Answer

A data.frame is a list and a column is a simple vector and very easy to extract from the list. A matrix is a vector with dimension attributes. Which values belong to one column has to be calculated from the dimensions. This effects subsetting, which you include in your benchmarks:

library(microbenchmark)

set.seed(42)
m <- matrix(rnorm(1e5), ncol=10)
DF <- as.data.frame(m)

microbenchmark(m[,1], DF[,1], DF$V1)
#Unit: microseconds
#   expr    min     lq median      uq      max neval
# m[, 1] 80.997 82.536 84.230 87.1560 1147.795   100
#DF[, 1] 15.399 16.939 20.789 22.6365  100.090   100
#  DF$V1  1.849  2.772  3.389  4.3130   90.235   100

However, the take-home message is not that you should always use a data.frame. Because if you do subsetting, where the result is not a vector:

microbenchmark(m[1:10, 1:10], DF[1:10, 1:10])
# Unit: microseconds
#           expr     min       lq   median      uq      max neval
#  m[1:10, 1:10]   1.233   1.8490   3.2345   3.697   11.087   100
# DF[1:10, 1:10] 211.267 219.7355 228.2050 252.226 1265.131   100

Why does which work faster on a data frame column compared to a matrix column?

Answers (2)

Related Questions