xzhu
xzhu

Reputation: 5755

How do I map a vector according to a simple set of rules?

I'm a new R learner and trying to figure out a better way to quickly map things in R.

Often I need to add a color sidebar next to a heat map that indicates different phenotypes, and in a lot of cases I have this boolean vector that indicates whether it is A-type (opposite to B-type):

is.a.type <- c(T, T, F, F, F, T)

Now I need to map this vector into a "red" "blue" vector, with TRUEs mapped to "red" and FALSEs mapped to "blue". In a lot of other languages this is usually a one-liner (for example, in Mathematica, we can do something like isAType /. {True -> "red", False -> "Blue"}, which is concise, clear and elegant). But in R I don't know what is an elegant (or "canonical") way to do this.

One easy to think way of doing this is of course using sapply:

sapply(is.a.type, function (x) if (x) "red" else "blue")

which to me sounds clumsy with unnecessary construction of a function. The other way I can think of is using R's indexing syntax:

colors <- is.a.type
colors[is.a.type] <- "red"
colors[!is.a.type] <- "blue"

which to me is a more clear way, but a little too verbose (I have to assign a temporary variable name, and refer to it several times). The third way I can think of is sort of a hack that takes advantage of booleans can be upgraded to integer in type conversions:

c("blue", "red")[is.a.type+1]

It is the shortest but I don't like it because it is so cryptical and very special to this particular problem and hard to generalize.

Do you think there's a better solution? I'm actually looking for a generalizable approach to map things according to a simple rule in R.

Upvotes: 0

Views: 206

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

If speed and readability is a concern, this might be the fastest option:

x <- rep("blue", length(is.a.type))
x[is.a.type] <- "red"
x

The other obvious alternative that I could think of is to use factor. This would be the most logical approach if your concern is to come up with a solution that is easy to generalize.

factor(is.a.type, c(TRUE, FALSE), c("red", "blue"))

Which should be reasonably fast--faster than your basic subsetting and replacement approach, at least.

Here are some timings, with @JeremyS's sample data:

AMfun1 <- function() factor(is.a.type, c(TRUE, FALSE), c("red", "blue"))
AMfun2 <- function() {
    x <- rep("blue", length(is.a.type))
    x[is.a.type] <- "red"
    x
}
OPfun1 <- function() {
  colors <- is.a.type
  colors[is.a.type] <- "red"
  colors[!is.a.type] <- "blue"
  colors
}
OPfun2 <- function() {
  c("blue", "red")[is.a.type+1]
}

library(microbenchmark)
microbenchmark(AMfun1(), AMfun2(), OPfun1(), OPfun2(), times = 20)
# Unit: milliseconds
#      expr       min        lq    median        uq        max neval
#  AMfun1() 6712.2610 6828.3065 7317.3582 7558.5444  8327.1019    20
#  AMfun2() 1055.2700 1114.6305 1192.7697 1285.2160  1341.8424    20
#  OPfun1() 8366.5327 8737.7971 9134.3010 9589.4956 10557.5743    20
#  OPfun2()  483.5799  530.0979  559.4926  592.9353   703.8037    20

Upvotes: 1

JeremyS
JeremyS

Reputation: 3525

In R you can do things a lot of different ways, a lot! The trick is finding the fastest way and using that. First: vectorise

is.a.type <- sample(c(T, F),1e07,replace=T)

system.time(res <- sapply(is.a.type, function (x) if (x) "red" else "blue"))
    user  system elapsed 
  23.921   0.068  24.040 # SLOW


Colors <- function(x) {
  x <- as.character(x) # This step seems odd, but makes it considerably faster
  x[x == "TRUE"] <- "red"
  x[x == "FALSE"] <- "blue"
  return(x)
}

system.time(res2 <- Colors(is.a.type))
   user  system elapsed 
  4.248   0.000   4.256 # Vectorised = best

system.time(res3 <- ifelse(is.a.type, "red", "blue"))
   user  system elapsed 
  7.417   0.132   7.560 # Ok, but not as good as a vectorised function

system.time(res <- c("blue", "red")[is.a.type+1])
 user  system elapsed 
0.276   0.080   0.357 # fastest but like you said, cryptic

R is all about making your own functions to do specific things, I wouldn't think of it as "unnecessary construction of a function" at all, but rather making use of the way R was designed.

Side note: colors is already a function so assigning a variable to that name can cause trouble

Upvotes: 1

Related Questions