How could I make this R snippet faster and more R-ish?

Question

Coming from various other languages, I find R powerful and intuitive, but I am not thrilled with its performance. So I decided to try to improve some snippet I wrote and learn how to code better in R.

Here's a function I wrote, trying to determine if a vector is binary-valued (two distinct values or just one value) or not:

isBinaryVector <- function(v) {
  if (length(v) == 0) {
    return (c(0, 1))
  }
  a <- v[1]
  b <- a
  lapply(v, function(x) { if (x != a && x != b) {if (a != b) { return (c()) } else { b = x }}})
  if (a < b) {
    return (c(a, b))
  } else {
    return (c(b, a))
  }
}

EDIT: This function is expected to look through a vector then return c() if it is not binary-valued, and return c(a, b) if it is, a being the small value and b being the larger one (if a == b then just c(a, a). E.g., for

I will lapply this isBinaryVector and get:

$A
[1] 1 1

$B
[1] 1 1

$C
[1] 0 0

The time it took on a moderate sized dataset (about 1800 * 3500, 2/3 of them are binary-valued) is about 15 seconds. The set contains only floating-point numbers.

Is there anyway I could do this faster?

Thanks for any inputs!

Andrie · Accepted Answer

You are essentially trying to write a function that returns TRUE if a vector has exactly two unique values, and FALSE otherwise.

Try this:

> dat <- data.frame(
+   A = 1:3,
+   B = c(1, 2, 1), 
+   C = 0
+ )
> 
> sapply(dat, function(x)length(unique(x))==2)
    A     B     C 
FALSE  TRUE FALSE

Next, you want to get the min and max value. The function range does this. So:

> sapply(dat, range)
     A B C
[1,] 1 1 0
[2,] 3 2 0

And there you have all the ingredients to make a small function that is easy to understand and should be extremely quick, even on large amounts of data:

isBinary <- function(x)length(unique(x))==2

binaryValues <- function(x){
  if(isBinary(x)) range(x) else NA
}

sapply(dat, binaryValues)

$A
[1] NA

$B
[1] 1 2

$C
[1] NA

How could I make this R snippet faster and more R-ish?

Answers (2)

Related Questions