Reputation: 477
I have data as following:
RNA$MMP2
[1] 1.506000000 0.957833333 2.285500000 -0.089333333 -1.233166667
[6] 1.591500000 -1.396500000 -0.260500000 0.583000000 -0.716333333
[11] 1.628833333 -0.390000000 -0.466166667 -0.550666667 1.001666667
[16] 1.399000000 -0.454500000 -0.492833333 0.695166667 0.397666667
If I were to replace these numeric variables with character variables based on a certain threshold (e.g., 1.0), I implement the following:
ifelse(RNA$MMP2 <= 1.0 ,"low","high")->x
What if I need to categorize into three character variables:
a) RNA$MMP2 < 0.5 ,"low";
b) RNA$MMP2 > 0.5 and < 1.0, "medium";
c) RNA$MMP2 > 1.0, "high";
Suggestions will be highly appreciated.
Upvotes: 2
Views: 5074
Reputation: 76402
Here is a findInterval
based solution.
c("low", "medium", "high")[findInterval(RNA$MMP2, c(-Inf, 0.5, 1, Inf))]
There are already several answers, here are comparative performance tests. The other posts are akrun with two solutions and juljo.
Rui <- function(){
c("low", "medium", "high")[findInterval(RNA$MMP2, c(-Inf, 0.5, 1, Inf))]
}
akrun1 <- function(){
cut(RNA$MMP2, breaks = c(-Inf, 0.5, 1.0, Inf),
labels = c("low", "medium", "high"))
}
akrun2 <- function(){
with(RNA, ifelse(MMP2 < 0.5, "low",
ifelse(MMP2 >= 0.5 & MMP2 < 1.0, "medium", "high")))
}
library(dplyr)
juljo <- function(){
RNA %>% mutate(MMP2 = case_when(
MMP2 < 0.5 ~ "low",
MMP2 > 0.5 & MMP2 < 1 ~ "medium",
MMP2 > 1 ~ "high"
))
}
library(microbenchmark)
mb <- microbenchmark(
Rui = Rui(),
akrun1 = akrun1(),
akrun2 = akrun2(),
juljo = juljo()
)
print(mb, unit = "relative", order = "median")
#Unit: relative
# expr min lq mean median uq max neval cld
# Rui 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
# akrun2 5.173591 4.077107 2.148212 2.874767 3.117478 1.701425 100 a
# akrun1 13.333069 10.033994 3.435633 8.305631 8.079830 1.017586 100 a
# juljo 235.851969 232.672643 50.541489 146.773911 142.042876 2.147924 100 b
MMP2 <- scan(text = '
1.506000000 0.957833333 2.285500000 -0.089333333 -1.233166667
1.591500000 -1.396500000 -0.260500000 0.583000000 -0.716333333
1.628833333 -0.390000000 -0.466166667 -0.550666667 1.001666667
1.399000000 -0.454500000 -0.492833333 0.695166667 0.397666667
')
RNA <- data.frame(MMP2)
Upvotes: 4
Reputation: 674
dplyr
's case_when()
is a great alternative for these type of cases:
library(dplyr)
RNA %>% mutate(MMP2 = case_when(
MMP2 < 0.5 ~ "low",
MMP2 > 0.5 & MMP2 < 1 ~ "medium",
MMP2 > 1 ~ "high"
))
Upvotes: 5
Reputation: 887118
An option would be cut
to do this for multiple conditions
cut(RNA$MMP2, breaks = c(-Inf, 0.5, 1.0, Inf),
labels = c("low", "medium", "high"))
If there are more than two groups, a nested ifelse
is required
with(RNA, ifelse(MMP2 < 0.5, "low",
ifelse(MMP2 >= 0.5 & MMP2 < 1.0, "medium", "high"))
Upvotes: 5