britt
britt

Reputation: 79

Using dplyr mutate_at to change specified list of variables with case_when statement

I'm trying to recode some columns in a data set. The columns have a lot of weird names like S3__8 or C4__2. There are also some categorical columns I want to leave alone that start with C like Case.

I used this segment to successfully recode all of the S columns:

Sa_Recode <- Sa %>%
  mutate_at(vars(starts_with("S")),
    funs(case_when(grepl("Yes", ., ignore.case = TRUE) ~ "1",
                   grepl("No", ., ignore.case = TRUE) ~ "0",
                   grepl("Some", ., ignore.case = TRUE) ~ "0.5",
                   TRUE                                         ~ "Else")))

I want to recode the C columns, but can't use the same logic because some of my other columns start with C. I've tried editing the mutate line like this with no luck:

Creating a list of the columns I need and making a list

list <- c('C1_(*)__', 'C2_4__', 'C3_(*)__', 'C3a_(*)__') 
mutate_at(vars(list),

Listing them as variables

mutate_at(c('C1_(*)__', 'C2_4__', 'C3_(*)__', 'C3a_(*)__'),

Listing them differently as variables

mutate_at(vars(c('C1_(*)__', 'C2_4__', 'C3_(*)__', 'C3a_(*)__')),

Calling a range of columns

mutate_at(Sa[,8:53],

I'll be repeating this process with about nine other sets (with different starting letters) and am hoping to learn how to manipulate the logic. Alternatively, is there a way to make the "else" in the case statement not recode the value? This could also fix the issue. Thanks!

Sample Input:
Case  S25_    S26_(*)__   C1_(*)__
A     No      Some        Yes
B     Yes     Skipped     Yes
C     No      N/A         Some

Desired output:
Case  S25_    S26_(*)__   C1_(*)__
A     0       0.5         1
B     1       Skipped     1
C     0       N/A         0.5

Upvotes: 0

Views: 817

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

You can use regular expressions to correctly identify columns that you want to change.

library(dplyr)
Sa %>%
  mutate_at(vars(matches('^S|C\\d+')),
             ~case_when(grepl("Yes", ., ignore.case = TRUE) ~ "1",
                        grepl("No", ., ignore.case = TRUE) ~ "0",
                        grepl("Some", ., ignore.case = TRUE) ~ "0.5",
                        TRUE ~ "Else"))

This will select columns which start with "S" or which has "C" followed by a number.

Also mutate_at has been replaced with across so you can now use :

Sa %>%
   mutate(across(matches('^S|C\\d+'),
            ~case_when(grepl("Yes", ., ignore.case = TRUE) ~ "1",
                       grepl("No", ., ignore.case = TRUE) ~ "0",
                       grepl("Some", ., ignore.case = TRUE) ~ "0.5",
                       TRUE ~ "Else")))

Upvotes: 1

Related Questions