Christoph Hanck
Christoph Hanck

Reputation: 353

recursively sort a vector in R

I have a task like

n <- 5
set.seed(11)
X <- rnorm(n)

X.sort <- {}
for(i in 1:n){
  X.sort <- sort.int(c(X.sort, X[i]), decreasing = TRUE)
  print(X.sort) # actually, other computations with X.sort
}

resulting in output of the sort of

[1] -0.5910311
[1]  0.02659437 -0.59103110
[1]  0.02659437 -0.59103110 -1.51655310
[1]  0.02659437 -0.59103110 -1.36265335 -1.51655310
[1]  1.17848916  0.02659437 -0.59103110 -1.36265335 -1.51655310

It strikes me as inefficient to "re-sort" X.sort in each round of the loop, when the vector is already sorted except for the new entry X[i] that is to be inserted.

I have tried "telling" R where to insert the element via

library(R.utils)

X.sort <- {}
for(i in 1:n){
  pos <- match(F, X.sort>X[i])
  if(is.na(pos)){
    X.sort <- c(X.sort,X[i])
  } else {
    X.sort <- insert(X.sort, pos, X[i])                      
  }
  print(X.sort)
}

but that does not yield any gains when benchmarking.

Is there an obvious improvement, or does R already efficiently exploit the knowledge that X.sort is sorted?

EDIT:

Benchmarking suggests [BUT PLEASE ALSO CONSIDER THE REPLIES BELOW] the accepted answer to be quickest (at least when n gets close to 1,000), which moreover appears to also work for large n, and to be the simplest one.

library(R.utils)
library(microbenchmark)
n <- 600
set.seed(11)
X <- rnorm(n)

sorted_insert <- function(x, y) { 
  c(x[x >= y], y, x[x < y]) 
}

recursive_fun <- function(ans=list(NULL), vec, i=1) { 
  if (i > length(vec)) {
    tail(ans, -1)
  } else {
    ans <- c(ans, list(sorted_insert(ans[[i]], vec[i]))) 
    recursive_fun(ans=ans, vec, i=i+1) 
  }
}

microbenchmark(
  {
    X.sort <- {}
    for(i in 1:n){
      X.sort <- sort.int(c(X.sort, X[i]), decreasing = TRUE)
    }
  },{
    X.sort <- {}
    for(i in 1:n){
      pos <- match(F, X.sort>X[i])
      if(is.na(pos)){
        X.sort <- c(X.sort,X[i])
      } else {
        X.sort <- insert(X.sort, pos, X[i])                      
      }
    }
    },{
    X.sort <- {X[1]}
    for(i in 2:n){
      X.sort <- append(X.sort, X[i], after = sum(X.sort > X[i]))
    }
  },{
    lapply(seq_along(X), function(a) {sort(X[seq_len(a)], decreasing = T)})
  },{
    lapply(1:length(X), function(i) sort(X[1:i], decreasing = T))
  },
  {
    recursive_fun(vec=X)
  },
  times=50
)

Result:

       min        lq      mean    median        uq       max neval
 21.308012 22.264314 24.065012 22.798643 26.381362 34.629395    50
 19.554413 20.334643 21.875769 20.617807 24.085896 30.625841    50
  4.497919  4.804550  5.380192  4.912923  5.114310 13.522485    50
 23.540616 24.105807 25.311692 24.335780 24.985024 30.348792    50
 23.251905 24.067122 25.722031 24.745380 27.986197 30.010018    50
  3.928746  4.096568  4.358911  4.258701  4.390684  9.106202    50

Upvotes: 0

Views: 403

Answers (3)

Shree
Shree

Reputation: 11140

Here's a way with lapply -

n <- 5
set.seed(11)
x <- rnorm(n)

lapply(seq_along(x), function(a) {
  sort(x[seq_len(a)], decreasing = T)
})

[[1]]
[1] -0.5910311

[[2]]
[1]  0.02659437 -0.59103110

[[3]]
[1]  0.02659437 -0.59103110 -1.51655310

[[4]]
[1]  0.02659437 -0.59103110 -1.36265335 -1.51655310

[[5]]
[1]  1.17848916  0.02659437 -0.59103110 -1.36265335 -1.51655310

For performance, you should consider @cyrilb38 approach with Reduce. See benchmarks below -

n <- 600
set.seed(11)
x <- rnorm(n)

r_sort <- function(x, y) {
  append(x, y, after = sum(x > y))
}


microbenchmark(
  lapply = lapply(seq_along(x), function(a) {
  sort(x[seq_len(a)], decreasing = T)
  }),
  forloop = {
    x.sort <- x[1]
    for(i in 2:n){
      x.sort <- append(x.sort, x[i], after = sum(x.sort > x[i]))
    }
  },
  Reduce = Reduce(r_sort, as.list(x), accumulate = T), # only if you want intermediate results
  Reduce2 = Reduce(r_sort, as.list(x)),
  times = 50
)

Unit: milliseconds
    expr       min        lq      mean    median        uq       max neval
  lapply 35.069533 36.318154 45.302362 37.870738 41.959249 203.45682    50
 forloop  8.366112  8.743501 11.196852  9.128391 11.800904  30.76272    50
  Reduce  4.574459  4.861448  7.418195  5.332593 11.076522  22.40293    50
 Reduce2  4.556300  4.754075  6.918486  5.161860  9.563809  14.41776    50

Upvotes: 0

CPak
CPak

Reputation: 13581

You could try this recursive approach

The working function is sorted_insert, which inserts the new-element at the vector position between elements >= new-element, vec[vec >= y], and < new-element, vec[vec < y]. This assumes that the vector is always sorted (which is true in this case).

sorted_insert <- function(x, y) { 
    c(x[x >= y], y, x[x < y]) 
}

sorted_function is called recursively by recursive_fun. If the counter i is <= the length of the vector (i.e. the entire unsorted-input-vector has not been traversed) then it will call sorted_function using the previous sorted-answer, ans[[i]] as the input-vector and vec[i] as the new-element to insert. That is, the sorted-vector at each iteration is built up using the sorted-vector from the previous iteration and a new element from the unsorted-input-vector. Sorry, tried my best to explain it.

recursive_fun <- function(ans=list(NULL), vec, i=1) { 
    if (i > length(vec)) {
        tail(ans, -1)
    } else {
        ans <- c(ans, list(sorted_insert(ans[[i]], vec[i]))) 
        recursive_fun(ans=ans, vec, i=i+1) 
    }
}

Using the given example

n <- 5
set.seed(11)
X <- rnorm(n)
recursive_fun(vec=X)

A larger example

n <- 1000
set.seed(11)
X <- rnorm(n)
recursive_fun(vec=X)

There is a problem if your unsorted-input-vector is large

n <- 10000
set.seed(11)
X <- rnorm(n)
recursive_fun(vec=X)
# Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

Note that if you do not want to collect the result at each iteration, you can use Reduce(sorted_insert, X), which should be fast.

Upvotes: 1

cyrilb38
cyrilb38

Reputation: 944

The bottleneck in your code is actually the print statement.

Here is another way that is about 5x faster (if you do not need to print) :

n <- 10000
set.seed(11)
X <- rnorm(n)

X.sort <- {X[1]}
for(i in 2:n){
  X.sort <- append(X.sort, X[i], after = sum(X.sort > X[i]))
}

Upvotes: 2

Related Questions