Reputation: 81
I have a dataset and I am trying to categorise the different rows into certain specific groups based on the description. Right now I am using a combination of grepl, %in% and == to first create a list and then loop the dataframe through for loop to get my desired result.
I am sure there must be a better way to do this than my current approach.Appreciate if someone can tell me the other approaches I can take to get the same output?
The list keeps increasing in size every month as I get new data so I found my current approach painful. Maybe a better way is to use mapping file or something but I am not sure.
Below is a sample of what I am currently doing,
df <- data.frame(Description = c('Intelli Software','Ichef','SPM Foods','SPM','NTUC','Lazada','Shoppee','Random'))
df$Category=NA
cat_list=list(c("grepl('Intelli',df$Description)|grepl('Ichef',df$Description)","Software"),
c("df$Description %in% c('SPM Foods','NTUC','SPM')","Grocery"),
c("grepl('Lazada|Shoppee',df$Description)","Online Shopping"),
c("is.na(df$Category)","Unknown Category"))
for (i in 1:length(cat_list)){
df$Category= ifelse(eval(parse(text = paste0("(",cat_list[[i]][1],")","& is.na(df$Category)"))),
cat_list[[i]][2],
df$Category)
}
Upvotes: 0
Views: 249
Reputation: 1922
Would tidyverse solution be helpful?
library(tidyverse)
df <- data.frame(Description = c('Intelli Software',
'Ichef',
'SPM Foods',
'SPM',
'NTUC',
'Lazada',
'Shoppee',
'Random'),
Category = NA)
df <- df %>%
mutate(cat_list = case_when(
(str_detect(Description, 'Intelli Software') | str_detect(Description, 'Ichef')) ~ 'Software',
Description %in% c('SPM Foods','NTUC','SPM') ~ "Grocery",
(str_detect(Description, 'Lazada') | str_detect(Description, 'Shoppee')) ~ 'Online Shopping',
TRUE ~ 'Unknown Category')
)
Upvotes: 1