Reputation: 5755
I'm a new R learner and trying to figure out a better way to quickly map things in R.
Often I need to add a color sidebar next to a heat map that indicates different phenotypes, and in a lot of cases I have this boolean vector that indicates whether it is A-type (opposite to B-type):
is.a.type <- c(T, T, F, F, F, T)
Now I need to map this vector into a "red" "blue" vector, with TRUEs mapped to "red" and FALSEs mapped to "blue". In a lot of other languages this is usually a one-liner (for example, in Mathematica, we can do something like isAType /. {True -> "red", False -> "Blue"}
, which is concise, clear and elegant). But in R I don't know what is an elegant (or "canonical") way to do this.
One easy to think way of doing this is of course using sapply
:
sapply(is.a.type, function (x) if (x) "red" else "blue")
which to me sounds clumsy with unnecessary construction of a function. The other way I can think of is using R's indexing syntax:
colors <- is.a.type
colors[is.a.type] <- "red"
colors[!is.a.type] <- "blue"
which to me is a more clear way, but a little too verbose (I have to assign a temporary variable name, and refer to it several times). The third way I can think of is sort of a hack that takes advantage of booleans can be upgraded to integer in type conversions:
c("blue", "red")[is.a.type+1]
It is the shortest but I don't like it because it is so cryptical and very special to this particular problem and hard to generalize.
Do you think there's a better solution? I'm actually looking for a generalizable approach to map things according to a simple rule in R.
Upvotes: 0
Views: 206
Reputation: 193517
If speed and readability is a concern, this might be the fastest option:
x <- rep("blue", length(is.a.type))
x[is.a.type] <- "red"
x
The other obvious alternative that I could think of is to use factor
. This would be the most logical approach if your concern is to come up with a solution that is easy to generalize.
factor(is.a.type, c(TRUE, FALSE), c("red", "blue"))
Which should be reasonably fast--faster than your basic subsetting and replacement approach, at least.
Here are some timings, with @JeremyS's sample data:
AMfun1 <- function() factor(is.a.type, c(TRUE, FALSE), c("red", "blue"))
AMfun2 <- function() {
x <- rep("blue", length(is.a.type))
x[is.a.type] <- "red"
x
}
OPfun1 <- function() {
colors <- is.a.type
colors[is.a.type] <- "red"
colors[!is.a.type] <- "blue"
colors
}
OPfun2 <- function() {
c("blue", "red")[is.a.type+1]
}
library(microbenchmark)
microbenchmark(AMfun1(), AMfun2(), OPfun1(), OPfun2(), times = 20)
# Unit: milliseconds
# expr min lq median uq max neval
# AMfun1() 6712.2610 6828.3065 7317.3582 7558.5444 8327.1019 20
# AMfun2() 1055.2700 1114.6305 1192.7697 1285.2160 1341.8424 20
# OPfun1() 8366.5327 8737.7971 9134.3010 9589.4956 10557.5743 20
# OPfun2() 483.5799 530.0979 559.4926 592.9353 703.8037 20
Upvotes: 1
Reputation: 3525
In R you can do things a lot of different ways, a lot! The trick is finding the fastest way and using that. First: vectorise
is.a.type <- sample(c(T, F),1e07,replace=T)
system.time(res <- sapply(is.a.type, function (x) if (x) "red" else "blue"))
user system elapsed
23.921 0.068 24.040 # SLOW
Colors <- function(x) {
x <- as.character(x) # This step seems odd, but makes it considerably faster
x[x == "TRUE"] <- "red"
x[x == "FALSE"] <- "blue"
return(x)
}
system.time(res2 <- Colors(is.a.type))
user system elapsed
4.248 0.000 4.256 # Vectorised = best
system.time(res3 <- ifelse(is.a.type, "red", "blue"))
user system elapsed
7.417 0.132 7.560 # Ok, but not as good as a vectorised function
system.time(res <- c("blue", "red")[is.a.type+1])
user system elapsed
0.276 0.080 0.357 # fastest but like you said, cryptic
R is all about making your own functions to do specific things, I wouldn't think of it as "unnecessary construction of a function" at all, but rather making use of the way R was designed.
Side note: colors is already a function so assigning a variable to that name can cause trouble
Upvotes: 1