Extract matched keyword for the text

Question

Looking for some help on extracting the keywords from the text. I have two data frame. The first dataframe has description column and the other data frame has only one column with the keywords.

I want to search the keywords from the dataframe2 on the description field and create a new column in dataframe1 with the matched keywords. If there are multiple keywords I need the newly added column with all the keywords seperated with comma as mentioned below.

Dataframe2

Keywords
New
FUND
EVENT 
Author
book

Dataframe1

ID  NAME    Month   DESCRIPTION              Keywords
12  x1       Jan    funding recived            fund
23  x2       Feb    author of the book     author, book
14  x3       Mar    new year event         new, event

Also, I need the keywords even the description has complete word. I.e Funding where I can get the keyword fund in the new column.

akrun · Accepted Answer

We can use regex_left_join from fuzzyjoin and do a group_by concatenation (paste)

library(fuzzyjoin)
library(dplyr)
df1 %>% 
   regex_left_join(df2, by = c('DESCRIPTION' = 'Keywords'), 
              ignore_case = TRUE) %>% 
   group_by(ID, NAME, Month, DESCRIPTION) %>% 
   summarise(Keywords = toString(unique(tolower(Keywords))))
# A tibble: 3 x 5
# Groups:   ID, NAME, Month [?]
#     ID NAME  Month DESCRIPTION        Keywords    
#                          
#1    12 x1    Jan   funding recived    fund        
#2    14 x3    Mar   new year event     new, event  
#3    23 x2    Feb   author of the book author, book

data

df1 <- structure(list(ID = c(12L, 23L, 14L), NAME = c("x1", "x2", "x3"
), Month = c("Jan", "Feb", "Mar"), DESCRIPTION = c("funding recived", 
"author of the book", "new year event")), .Names = c("ID", "NAME", 
"Month", "DESCRIPTION"), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(Keywords = c("New", "FUND", "EVENT", "Author", 
"book")), .Names = "Keywords", class = "data.frame", row.names = c(NA, 
-5L))

Extract matched keyword for the text

Answers (2)

data

Related Questions