Octobanana
Octobanana

Reputation: 11

Applying mutate() and case_when() for each element in the dataframe using dplyr/tidyr?

I originally have a dataset with size: 1652 rows, 50 columns, but the essence is the same in this example. The dataset is df, and I wish to end up with newdf. Does anybody know how to solve this using dplyr?

df <- cbind(c("HouseCar", "Car", "carmouse", "mouse", NA),
                  c("car", NA, "Mousehouse", "housevan", NA),
                   c(NA, "mousevan", "Carhouse", NA, "mouse"))

     df
     [,1]       [,2]         [,3]      
[1,] "HouseCar" "car"        NA        
[2,] "Car"      NA           "mousecar"
[3,] "carmouse" "Mousehouse" "Carhouse"
[4,] "mouse"    "house"      NA        
[5,] NA         NA           "mouse"  

Desired output (there is a recoding hierarchy House > Vehicle (van, car) > Mouse):

> newdf
     [,1]      [,2]      [,3]     
[1,] "House"   "Vehicle" NA       
[2,] "Vehicle" NA        "Vehicle"
[3,] "Vehicle" "House"   "House"  
[4,] "Mouse"   "House"   NA       
[5,] NA        NA        "Mouse" 

This is how I plan to do it, but I am wondering why this code won't work?

newdf <- df %>% 
  replace_na(., NA_character_) %>% 
  tolower(.) %>% 
  mutate_all(case_when(
    str_detect(., "house") ~ "House",
    str_detect(., "car|van") ~ "Vehicle",
    str_detect(., "mouse") ~ "Mouse",
    TRUE ~ NA_character_
    )
  )
  

I keep getting this error message:

Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "c('matrix', 'array', 'character')"

Upvotes: 0

Views: 735

Answers (1)

LMc
LMc

Reputation: 18632

To do this df needs to be a tibble or dataframe object. The output of cbind is a matrix, which is in part why you are getting your error.

df <- data.frame(df)
names(df) <- paste0("col", 1:3)

df %>% 
  mutate(across(everything(), ~ case_when(
    str_detect(.x, "house") ~ "House",
    str_detect(.x, "car|van") ~ "Vehicle",
    str_detect(.x, "mouse") ~ "Mouse",
    T ~ NA_character_
  )))

As of dplyr version 1.0.0 nearly all verbs with _at, _all, _if, etc. have been superseded by the across verb. You can check your version by using packageVersion("dplyr").

dplyr < 1.0.0

df %>% 
  mutate_all(~ case_when(
    str_detect(.x, "house") ~ "House",
    str_detect(.x, "car|van") ~ "Vehicle",
    str_detect(.x, "mouse") ~ "Mouse",
    T ~ NA_character_
  ))

Output

     col1    col2    col3
1    <NA> Vehicle    <NA>
2    <NA>    <NA> Vehicle
3 Vehicle   House   House
4   Mouse   House    <NA>
5    <NA>    <NA>   Mouse

Update

To ignore case you can do a few things:

  1. As you mentioned in your comments you can change the case to everything before applying your function:
df %>% 
  mutate(across(everything(), tolower)) # pipe to rest of code
  1. You can use stringr::regex to ignore case on your str_detect:
df %>% 
  mutate(across(everything(), ~ case_when(
    str_detect(.x, regex("house", ignore_case = T)) ~ "House",
    str_detect(.x, regex("car|van", ignore_case = T)) ~ "Vehicle",
    str_detect(.x, regex("mouse", ignore_case = T)) ~ "Mouse",
    T ~ NA_character_
  )))
  1. The built in function grepl does the same thing:
df %>% 
  mutate(across(everything(), ~ case_when(
    grepl("house", .x, ignore.case = T) ~ "House",
    grepl("car|van", .x, ignore.case = T) ~ "Vehicle",
    grepl("mouse", .x, ignore.case = T) ~ "Mouse",
    T ~ NA_character_
  )))

Upvotes: 3

Related Questions