Reputation: 11
I originally have a dataset with size: 1652 rows, 50 columns, but the essence is the same in this example. The dataset is df, and I wish to end up with newdf. Does anybody know how to solve this using dplyr?
df <- cbind(c("HouseCar", "Car", "carmouse", "mouse", NA),
c("car", NA, "Mousehouse", "housevan", NA),
c(NA, "mousevan", "Carhouse", NA, "mouse"))
df
[,1] [,2] [,3]
[1,] "HouseCar" "car" NA
[2,] "Car" NA "mousecar"
[3,] "carmouse" "Mousehouse" "Carhouse"
[4,] "mouse" "house" NA
[5,] NA NA "mouse"
Desired output (there is a recoding hierarchy House > Vehicle (van, car) > Mouse):
> newdf
[,1] [,2] [,3]
[1,] "House" "Vehicle" NA
[2,] "Vehicle" NA "Vehicle"
[3,] "Vehicle" "House" "House"
[4,] "Mouse" "House" NA
[5,] NA NA "Mouse"
This is how I plan to do it, but I am wondering why this code won't work?
newdf <- df %>%
replace_na(., NA_character_) %>%
tolower(.) %>%
mutate_all(case_when(
str_detect(., "house") ~ "House",
str_detect(., "car|van") ~ "Vehicle",
str_detect(., "mouse") ~ "Mouse",
TRUE ~ NA_character_
)
)
I keep getting this error message:
Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "c('matrix', 'array', 'character')"
Upvotes: 0
Views: 735
Reputation: 18632
To do this df
needs to be a tibble or dataframe object. The output of cbind
is a matrix, which is in part why you are getting your error.
df <- data.frame(df)
names(df) <- paste0("col", 1:3)
df %>%
mutate(across(everything(), ~ case_when(
str_detect(.x, "house") ~ "House",
str_detect(.x, "car|van") ~ "Vehicle",
str_detect(.x, "mouse") ~ "Mouse",
T ~ NA_character_
)))
As of dplyr version 1.0.0 nearly all verbs with _at, _all, _if, etc. have been superseded by the across
verb. You can check your version by using packageVersion("dplyr")
.
dplyr < 1.0.0
df %>%
mutate_all(~ case_when(
str_detect(.x, "house") ~ "House",
str_detect(.x, "car|van") ~ "Vehicle",
str_detect(.x, "mouse") ~ "Mouse",
T ~ NA_character_
))
Output
col1 col2 col3
1 <NA> Vehicle <NA>
2 <NA> <NA> Vehicle
3 Vehicle House House
4 Mouse House <NA>
5 <NA> <NA> Mouse
Update
To ignore case you can do a few things:
df %>%
mutate(across(everything(), tolower)) # pipe to rest of code
stringr::regex
to ignore case on your str_detect
:df %>%
mutate(across(everything(), ~ case_when(
str_detect(.x, regex("house", ignore_case = T)) ~ "House",
str_detect(.x, regex("car|van", ignore_case = T)) ~ "Vehicle",
str_detect(.x, regex("mouse", ignore_case = T)) ~ "Mouse",
T ~ NA_character_
)))
grepl
does the same thing:df %>%
mutate(across(everything(), ~ case_when(
grepl("house", .x, ignore.case = T) ~ "House",
grepl("car|van", .x, ignore.case = T) ~ "Vehicle",
grepl("mouse", .x, ignore.case = T) ~ "Mouse",
T ~ NA_character_
)))
Upvotes: 3