Why sort is slower than order function in R?

Question

All is in the title. I would expect that order uses sort to find the order of the values in a vector. Thus sort should be quicker than order to sort a vector, but this is not the case:

library(microbenchmark)
ss=sample(100,10000,replace=T)
microbenchmark(sort(ss))
microbenchmark(ss[order(ss)])

result:

> microbenchmark(sort(ss))
Unit: microseconds
    expr     min       lq     mean  median       uq      max neval
 sort(ss) 141.535 144.6415 173.6581 146.358 150.2295 2531.762   100
> microbenchmark(ss[order(ss)])
Unit: microseconds
        expr     min       lq     mean  median       uq     max neval
 ss[order(ss)] 109.198 110.9865 115.6275 111.901 115.3655 197.204   100

Example with a larger vector:

ss=sample(100,1e8,replace=T)
microbenchmark(sort(ss), ss[order(ss)], times = 5)
# Unit: seconds
#           expr      min       lq     mean   median       uq      max neval
#       sort(ss) 5.427966 5.431971 5.892629 6.049515 6.207060 6.346633     5
#  ss[order(ss)] 3.381253 3.500134 3.562048 3.518079 3.625778 3.784997     5

Hugh · Accepted Answer

The treatment of NA values under the default arguments is different. In sort, the entire vector must be scanned for NA values, which are then removed; in order, they are simply put last. When the argument sort.last = TRUE is used in both, the performance is basically identical.

ss=sample(100,1e8,replace=T) 
bench::mark(sort(ss), ss[order(ss)], sort(ss, na.last = TRUE))
# A tibble: 3 x 14
  expression    min   mean median    max `itr/sec` mem_alloc  n_gc n_itr total_time result
                       
1 sort(ss)   2.610s 2.610s 2.610s 2.610s     0.383 762.940MB     0     1     2.610s , time , gc

Why sort is slower than order function in R?

Answers (2)

Related Questions