emlab
emlab

Reputation: 23

How can I split variable X into 2 variables depending on a character in X?

I have a variable that looks like this:

df$Code
22
34
24
12
44

How can I create a new variable in the data frame, such that a subject with a value of "4" in df$Code is grouped as "Patient", while everyone else is grouped as "Controls" in a new df$Groups?

df$Groups
Control
Patient
Patient
Control
Patient

Thank you!

Upvotes: 1

Views: 85

Answers (4)

Peter H.
Peter H.

Reputation: 2164

Alternatively a function such as recode() is ideal for this - especially if you have more than two categories.

library(tidyverse)

tibble(code = c(22, 34, 24, 12, 44)) %>% 
  mutate(
    group = recode(code %% 10, `2` = "patient", `4` = "control")
  )

#> # A tibble: 5 x 2
#>    code group  
#>   <dbl> <chr>  
#> 1    22 patient
#> 2    34 control
#> 3    24 control
#> 4    12 patient
#> 5    44 control

Created on 2021-07-15 by the reprex package (v1.0.0)

Upvotes: 1

TarJae
TarJae

Reputation: 78917

We could use grepl in combination with ifelse

library(dplyr)
df  %>% 
  mutate(Groups = ifelse(
    grepl("4", as.character(Code)), 'Patient', 'Control'))

Output:

 Code Groups 
  <dbl> <chr>  
1    22 Control
2    34 Patient
3    24 Patient
4    12 Control
5    44 Patient

Upvotes: 0

GKi
GKi

Reputation: 39647

In case the last digit should be tested if it is a 4 endsWith or grepl could be used:

c("Control", "Patient")[1 + endsWith(as.character(df$Code), "4")]
#[1] "Control" "Patient" "Patient" "Control" "Patient"

c("Control", "Patient")[1 + grepl("4$", df$Code)]
#[1] "Control" "Patient" "Patient" "Control" "Patient"

or at any position:

c("Control", "Patient")[1 + grepl("4", df$Code)]
#[1] "Control" "Patient" "Patient" "Control" "Patient"

Data:

df <- data.frame(Code = c(22, 34, 24, 12, 44))

Upvotes: 4

bird
bird

Reputation: 3294

Using tidyverse:

library(tidyverse)
df %>% 
        mutate(group = ifelse(str_detect(as.character(Code), "4"), "Patient", "Control"))

Output:

   Code group  
  <dbl> <chr>  
1    22 Control
2    34 Patient
3    24 Patient
4    12 Control
5    44 Patient

Note that it detects "4" no matter if it comes first (e.g. 42) or second (e.g. 24) as I assumed this is what you want. If only the last digit should match, then use:

df %>% 
        mutate(group = ifelse(str_ends(as.character(Code), "4"), "Patient", "Control"))

Upvotes: 3

Related Questions