Reputation: 1111
Suppose you have a string-heavy dataframe:
x <- data.frame(name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Charlie"),
prod = c("Hard Hat", "Goggles", "Bus Fare", "Goggles", "Training", "Hard Hat, Laptop"))
How can you add a mutated column (let's call it category
) to this dataframe to categorise the data based on some arbitrary criteria. For example how can I set x$category
to equal "PPE" if the word 'Hard Hat' or 'Goggles' appears in x$prod
but equal "IT" if the word 'Laptop' appears in x$prod
?
In addition, I would like the matching to also handle partial matches and different cases, if possible. For example, 'Bus Fare' could also be input as (non-exhaustive list) 'Bus Ticket', or 'BUS FARE' or 'Bus TICKET'; in either case, I'd need to categorize it as 'Transport' as the word 'Bus' will be present.
Expected output:
name prod category
1 Alice Hard Hat PPE
2 Alice Goggles PPE
3 Alice Bus Fare TRANSPORT
4 Bob Goggles PPE
5 Bob Training TRAINING
6 Charlie Laptop IT
I would ideally like to solve this within tidyverse
and I think it will require a combination of mutate()
and various stringr
functions but I can't quite figure out the exact workflow I will require.
Upvotes: 2
Views: 490
Reputation: 3729
Given your situation, you will probably need to create a vector of keywords for each category and use str_detect
using concatenated |
statements:
x <- data.frame(name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Charlie"),
prod = c("Hard Hat", "Goggles", "Bus Fare", "Goggles", "Training", "Hard Hat, Laptop"))
transport <- c("bus")
ppe <- c("goggles", "hard hat")
tech <- c("laptop")
training <- c("training")
x <- x %>%
mutate(
category =
case_when(
str_detect(tolower(prod), paste(transport, collapse = "|")) ~ "TRANSPORT",
str_detect(tolower(prod), paste(ppe, collapse = "|")) ~ "PPE",
str_detect(tolower(prod), paste(tech, collapse = "|")) ~ "IT",
str_detect(tolower(prod), paste(training, collapse = "|")) ~ "TRAINING",
)
)
> x
name prod category
1 Alice Hard Hat PPE
2 Alice Goggles PPE
3 Alice Bus Fare TRANSPORT
4 Bob Goggles PPE
5 Bob Training TRAINING
6 Charlie Hard Hat, Laptop PPE
Upvotes: 2