F. Privé
F. Privé

Reputation: 11738

Subsetting with negative indices: best practices?

Say I have a function for subsetting (this is just a minimal example):

f <- function(x, ind = seq(length(x))) {
  x[ind]
}

(Note: one could use only seq(x) instead of seq(length(x)), but I don't find it very clear.)

So, if

x <- 1:5
ind <- c(2, 4)
ind2 <- which(x > 5) # integer(0)

I have the following results:

f(x) 
[1] 1 2 3 4 5
f(x, ind)
[1] 2 4
f(x, -ind)
[1] 1 3 5
f(x, ind2)
integer(0)
f(x, -ind2)
integer(0)

For the last result, we would have wanted to get all x, but this is a common cause of error (as mentionned in the book Advanced R).

So, if I want to make a function for removing indices, I use:

f2 <- function(x, ind.rm) {
  f(x, ind = `if`(length(ind.rm) > 0, -ind.rm, seq(length(x))))
}

Then I get what I wanted:

f2(x, ind)
[1] 1 3 5
f2(x, ind2)
[1] 1 2 3 4 5

My question is: Can I do something cleaner and that doesn't need passing seq(length(x)) explicitly in f2 but using directly the default value of f's parameter ind when ind.rm is integer(0)?

Upvotes: 3

Views: 398

Answers (3)

F. Priv&#233;
F. Priv&#233;

Reputation: 11738

To implement "parameter1 = if(cond1) then value1 else default_value_of_param1", I used formals to get default parameters as a call:

f <- function(x, ind.row = seq_len(nrow(x)), ind.col = seq_len(ncol(x))) {
  x[ind.row, ind.col]
}

f2 <- function(x, ind.row.rm = integer(0), ind.col.rm = integer(0)) {
  f.args <- formals(f)
  f(x, 
    ind.row = `if`(length(ind.row.rm) > 0, -ind.row.rm, eval(f.args$ind.row)),
    ind.col = `if`(length(ind.col.rm) > 0, -ind.col.rm, eval(f.args$ind.col)))
}

Then:

> x <- matrix(1:6, 2)

> f2(x, 1:2)
     [,1] [,2] [,3]

> f2(x, , 1:2)
[1] 5 6

> f2(x, 1, 2)
[1] 2 6

> f2(x, , 1)
     [,1] [,2]
[1,]    3    5
[2,]    4    6

> f2(x, 1, )
[1] 2 4 6

> f2(x)
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Upvotes: 0

John Coleman
John Coleman

Reputation: 52008

What you have isn't bad, but if you want to avoid passing the default value of a default argument you could restructure like this:

f2 <- function(x, ind.rm) {
    `if`(length(ind.rm) > 0, f(x,-ind.rm), f(x))
}

which is slightly shorter than what you have.

On Edit

Based on the comments, it seems you want to be able to pass a function nothing (rather than simply not pass at all), so that it uses the default value. You can do so by writing a function which is set up to receive nothing, also known as NULL. You can rewrite your f as:

f <- function(x, ind = NULL) {
    if(is.null(ind)){ind <- seq(length(x))}
    x[ind]
}

NULL functions as a flag which tells the receiving function to use a default value for the parameter, although that default value must be set in the body of the function.

Now f2 can be rewritten as

f2 <- function(x, ind.rm) {
    f(x, ind = `if`(length(ind.rm) > 0, -ind.rm, NULL))
}

This is slightly more readable than what you have, but at the cost of making the original function slightly longer.

Upvotes: 2

C8H10N4O2
C8H10N4O2

Reputation: 19025

If you anticipate having "empty" negative indices a lot, you can get a performance improvement for these cases if you can avoid the indexing used by x[seq(x)] as opposed to just x. In other words, if you are able to combine f and f2 into something like:

new_f <- function(x, ind.rm){
  if(length(ind.rm)) x[-ind.rm] else x
}

There will be a huge speedup in the case of empty negative indices.

n <- 1000000L
x <- 1:n
ind <- seq(0L,n,2L)
ind2 <- which(x>n+1) # integer(0)

library(microbenchmark)
microbenchmark(
  f2(x, ind),
  new_f(x, ind),
  f2(x, ind2),
  new_f(x, ind2)
)
all.equal(f2(x, ind), new_f(x, ind)) # TRUE - same result at about same speed
all.equal(f2(x, ind2), new_f(x, ind2)) # TRUE - same result at much faster speed

Unit: nanoseconds
           expr     min        lq        mean  median       uq      max neval
     f2(x, ind) 6223596 7377396.5 11039152.47 9317005 10271521 50434514   100
  new_f(x, ind) 6190239 7398993.0 11129271.17 9239386 10202882 59717093   100
    f2(x, ind2) 6823589 7992571.5 11267034.52 9217149 10568524 63417978   100
 new_f(x, ind2)     428    1283.5     5414.74    6843     7271    14969   100

Upvotes: 2

Related Questions