Vinay
Vinay

Reputation: 477

Ifelse() with three conditions in R

I have data as following:

    RNA$MMP2
 [1]  1.506000000  0.957833333  2.285500000 -0.089333333 -1.233166667
 [6]  1.591500000 -1.396500000 -0.260500000  0.583000000 -0.716333333
[11]  1.628833333 -0.390000000 -0.466166667 -0.550666667  1.001666667
[16]  1.399000000 -0.454500000 -0.492833333  0.695166667  0.397666667  

If I were to replace these numeric variables with character variables based on a certain threshold (e.g., 1.0), I implement the following:

ifelse(RNA$MMP2 <= 1.0 ,"low","high")->x

What if I need to categorize into three character variables:

a) RNA$MMP2 < 0.5 ,"low"; 

b) RNA$MMP2 > 0.5 and < 1.0, "medium";

c) RNA$MMP2 > 1.0, "high";

Suggestions will be highly appreciated.

Upvotes: 2

Views: 5074

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a findInterval based solution.

c("low", "medium", "high")[findInterval(RNA$MMP2, c(-Inf, 0.5, 1, Inf))]

Tests

There are already several answers, here are comparative performance tests. The other posts are akrun with two solutions and juljo.

Rui <- function(){
  c("low", "medium", "high")[findInterval(RNA$MMP2, c(-Inf, 0.5, 1, Inf))]
}

akrun1 <- function(){
  cut(RNA$MMP2, breaks = c(-Inf, 0.5, 1.0, Inf), 
      labels = c("low", "medium", "high"))
}

akrun2 <- function(){
  with(RNA, ifelse(MMP2 < 0.5, "low", 
                   ifelse(MMP2 >= 0.5 & MMP2 < 1.0, "medium", "high")))
}

library(dplyr)
juljo <- function(){
  RNA %>% mutate(MMP2 = case_when(
    MMP2 < 0.5 ~ "low",
    MMP2 > 0.5 & MMP2 < 1 ~ "medium",
    MMP2 > 1 ~ "high"
  ))
}

library(microbenchmark)
mb <- microbenchmark(
  Rui = Rui(), 
  akrun1 = akrun1(),
  akrun2 = akrun2(),
  juljo = juljo()
)
print(mb, unit = "relative", order = "median")
#Unit: relative
#   expr        min         lq      mean     median         uq      max neval cld
#    Rui   1.000000   1.000000  1.000000   1.000000   1.000000 1.000000   100  a 
# akrun2   5.173591   4.077107  2.148212   2.874767   3.117478 1.701425   100  a 
# akrun1  13.333069  10.033994  3.435633   8.305631   8.079830 1.017586   100  a 
#  juljo 235.851969 232.672643 50.541489 146.773911 142.042876 2.147924   100   b
  

Data

MMP2 <- scan(text = '  
1.506000000  0.957833333  2.285500000 -0.089333333 -1.233166667
1.591500000 -1.396500000 -0.260500000  0.583000000 -0.716333333
1.628833333 -0.390000000 -0.466166667 -0.550666667  1.001666667
1.399000000 -0.454500000 -0.492833333  0.695166667  0.397666667
')
RNA <- data.frame(MMP2)

Upvotes: 4

juljo
juljo

Reputation: 674

dplyr's case_when() is a great alternative for these type of cases:

library(dplyr)

RNA %>% mutate(MMP2 = case_when(
  MMP2 < 0.5 ~ "low",
  MMP2 > 0.5 & MMP2 < 1 ~ "medium",
  MMP2 > 1 ~ "high"
))

Upvotes: 5

akrun
akrun

Reputation: 887118

An option would be cut to do this for multiple conditions

cut(RNA$MMP2, breaks = c(-Inf, 0.5, 1.0, Inf), 
        labels = c("low", "medium", "high"))

If there are more than two groups, a nested ifelse is required

with(RNA, ifelse(MMP2 < 0.5, "low", 
          ifelse(MMP2 >= 0.5 & MMP2 < 1.0, "medium", "high")) 

Upvotes: 5

Related Questions