Din
Din

Reputation: 81

Adding a tag/category to dataframe based description in R

I have a dataset and I am trying to categorise the different rows into certain specific groups based on the description. Right now I am using a combination of grepl, %in% and == to first create a list and then loop the dataframe through for loop to get my desired result.

I am sure there must be a better way to do this than my current approach.Appreciate if someone can tell me the other approaches I can take to get the same output?

The list keeps increasing in size every month as I get new data so I found my current approach painful. Maybe a better way is to use mapping file or something but I am not sure.

Below is a sample of what I am currently doing,


df <- data.frame(Description = c('Intelli Software','Ichef','SPM Foods','SPM','NTUC','Lazada','Shoppee','Random'))
df$Category=NA


cat_list=list(c("grepl('Intelli',df$Description)|grepl('Ichef',df$Description)","Software"),
              c("df$Description %in% c('SPM Foods','NTUC','SPM')","Grocery"),
              c("grepl('Lazada|Shoppee',df$Description)","Online Shopping"),
              c("is.na(df$Category)","Unknown Category"))

for (i in 1:length(cat_list)){
  df$Category= ifelse(eval(parse(text = paste0("(",cat_list[[i]][1],")","& is.na(df$Category)"))),
                      cat_list[[i]][2],
                      df$Category)
}

Upvotes: 0

Views: 249

Answers (1)

Susan Switzer
Susan Switzer

Reputation: 1922

Would tidyverse solution be helpful?

library(tidyverse)
df <- data.frame(Description = c('Intelli Software',
                                 'Ichef',
                                 'SPM Foods',
                                 'SPM',
                                 'NTUC',
                                 'Lazada',
                                 'Shoppee',
                                 'Random'), 
                 Category = NA)


df <- df %>% 
  mutate(cat_list = case_when(
    (str_detect(Description, 'Intelli Software') | str_detect(Description, 'Ichef')) ~ 'Software', 
    Description %in% c('SPM Foods','NTUC','SPM') ~ "Grocery", 
    (str_detect(Description, 'Lazada') | str_detect(Description, 'Shoppee')) ~ 'Online Shopping', 
    TRUE ~ 'Unknown Category')
  )

Upvotes: 1

Related Questions