Adreeta
Adreeta

Reputation: 21

Is there a function in R tidyverse to categorize character values of a column based on key words and assign a category?

For example:

dataframe 1 has:

Keyword <- c("dog", "cat", "tiger", "cheetah", "man")
Category <- c("walk", "house", "jungle", "fast", "office")

and I have a second dataframe 2 with a column that has description:

description examples can be <- c("dog is barking", "cat is purring","tiger is hunting", 
"cheetah is running", "man is working")

I want to write a function that will search the description column of dataframe 2 as per the specific keywords in dataframe 1, and then give out a category. How do I do this using tidyverse? thanks!

Upvotes: 1

Views: 225

Answers (1)

Anoushiravan R
Anoushiravan R

Reputation: 21908

This doesn't look elegant but it will work even with no matching characters:

library(dplyr)
library(purrr)

df3 %>%
  rowwise() %>%
  mutate(output = strsplit(des, "\\s+", perl = TRUE)) %>%
  unnest_wider(col = output) %>%
  mutate(Category = pmap_chr(select(cur_data(), !des), ~ {x <- df$Category[df$Keyword %in% c(...)]
  if(length(x) != 0) {
    x
  } else {
    NA_character_
  }})) %>%
  select(des, Category)

# A tibble: 5 x 2
  des                Category
  <chr>              <chr>   
1 dog is barking     walk    
2 cat is purring     house   
3 tiger is hunting   jungle  
4 cheetah is running fast    
5 anoush is working  NA 

Data

> dput(df2)
structure(list(des = c("dog is barking", "cat is purring", "tiger is hunting", 
"cheetah is running", "man is working")), row.names = c(NA, -5L
), class = "data.frame")

> dput(df)
structure(list(Category = c("walk", "house", "jungle", "fast", 
"office"), Keyword = c("dog", "cat", "tiger", "cheetah", "man"
)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Related Questions