Shahin
Shahin

Reputation: 1316

Replace column values that match multiple different patterns

In the dataframe below:

library(tidyverse)
df <- tibble(notes=c("Positive result","Negative","NEG","POS >2","pOS","Cannot Determine","2.4","3.1","0.2"))

  notes           
  <chr>           
1 Positive result 
2 Negative        
3 NEG             
4 POS >2          
5 pOS             
6 Cannot Determine
7 2.4             
8 3.1             
9 0.2

I would like to define a one-liner to replace the entries in the note column that match a pattern. I would have used the ternary operator if there were only two conditions. But here I have 5.

I am looking to replace values in notes with:

  1. could be turned into a double -> "3"
  2. grepl("pos",tolower(notes)) -> "2"
  3. grepl("neg",tolower(notes)) -> "1"
  4. none of the above -> "0"

I initially did:

df %>%
  mutate(notes=ifelse(grepl("[[:digit:]]+",notes)),"3",notes) %>%  # could be coerced into a double
  mutate(notes=ifelse(grepl("pos",tolower(notes))),"2",notes) %>%  # contains "pos"
  mutate(notes=ifelse(grepl("neg",tolower(notes))),"1",notes) %>%  # contains "neg"
  mutate(notes=ifelse(grepl("3|2|1",tolower(notes))),notes,"0") %>%  # none of the above
  type.convert()

Desired Output

notes           
<dbl>           
1 2 
2 1        
3 1             
4 2          
5 2             
6 0
7 3            
8 3             
9 3             

Upvotes: 3

Views: 167

Answers (1)

akrun
akrun

Reputation: 886948

We can use case_when

library(dplyr)
library(stringr)
df %>% 
  mutate(notes1 = toupper(substr(notes, 1, 3)), 
        notes =case_when(notes1 == "POS" ~ 2, 
                         notes1 == 'NEG' ~ 1, 
                         str_detect(notes, '^[0-9.]+$')~ 3,
                         TRUE ~ 0)) %>%
  select(-notes1)
# A tibble: 9 x 1
#  notes
#  <dbl>
#1     2
#2     1
#3     1
#4     2
#5     2
#6     0
#7     3
#8     3
#9     3

If we need to keep the numeric values as such, one option is as.numeric and then coalesce

df %>% 
 mutate(notes1 = toupper(substr(notes, 1, 3)), 
      notes2 =case_when(notes1 == "POS" ~ 2, 
                       notes1 == 'NEG' ~ 1, 
                       str_detect(notes, '^[0-9.]+$')~ 3,
                       TRUE ~ 0)) %>%
 select(-notes1) %>% 
 mutate(notes = coalesce(as.numeric(notes), notes2)) %>% 
 select(-notes2)

Upvotes: 4

Related Questions