Nathan123
Nathan123

Reputation: 763

How to use string detect with case_when?

I have the following column in my data frame that contain charges

library(dplyr)
library(stringr)

  df<-data.frame(charge=c("trespass-1st degree",
      "trespass - 1st degree","rape or attempted rape - 1st degree",
      "rape or attempt rape 1st degree","Assault 1st","Assault 1st"))

                               charge
1                 trespass-1st degree
2               trespass - 1st degree
3 rape or attempted rape - 1st degree
4     rape or attempt rape 1st degree
5                         Assault 1st
6                         Assault 1st

I want to make sure that certain charges that have data entry errors are standardized. e.g trespass-1st degree vs trespass - 1st degree and rape or attempted rape - 1st degree vs rape or attempt rape 1st degree

I tried the following

df%>%
  mutate(charge=
           case_when(str_detect(charge, "^trespass-1st") ~ "Trespass 1st",
                     str_detect(charge,"^rape or attempted rape")~"Rape 1st"))

which gives me the following output

        charge
1 Trespass 1st
2         <NA>
3     Rape 1st
4         <NA>
5         <NA>
6         <NA>

How do I make sure that if only two strings are present like "tresspass" and "1st" then that gets tagged as " Trespass 1st" and if "rape" and "1st" are present in the charge column then that gets tagged as "Rape 1st"

To get the following df

        charge
1 Trespass 1st
2 Trespass 1st        
3     Rape 1st
4     Rape 1st
5  Assault 1st
6  Assault 1st

Upvotes: 1

Views: 629

Answers (1)

akrun
akrun

Reputation: 887028

The issue is that some elements doesn't have spaces (trespass-1st vs trespass-1st) or some suffix (attempt vs attempted)

library(dplyr)
df %>%
    mutate(charge=
         case_when(str_detect(charge, "^trespass\\s*-\\s*1st") ~ 
           "Trespass 1st",
                  str_detect(charge,"^rape or attempte*d* rape")~"Rape 1st", 
              TRUE ~ charge))
#        charge
#1 Trespass 1st
#2 Trespass 1st
#3     Rape 1st
#4     Rape 1st
#5  Assault 1st
#6  Assault 1st

data

df <- structure(list(charge = c("trespass-1st degree", "trespass - 1st degree", 
"rape or attempted rape - 1st degree", "rape or attempt rape 1st degree", 
"Assault 1st", "Assault 1st")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Upvotes: 1

Related Questions