Reputation: 301
Looking for some help on extracting the keywords from the text. I have two data frame. The first dataframe has description column and the other data frame has only one column with the keywords.
I want to search the keywords from the dataframe2 on the description field and create a new column in dataframe1 with the matched keywords. If there are multiple keywords I need the newly added column with all the keywords seperated with comma as mentioned below.
Dataframe2
Keywords
New
FUND
EVENT
Author
book
Dataframe1
ID NAME Month DESCRIPTION Keywords
12 x1 Jan funding recived fund
23 x2 Feb author of the book author, book
14 x3 Mar new year event new, event
Also, I need the keywords even the description has complete word. I.e Funding where I can get the keyword fund in the new column.
Upvotes: 2
Views: 371
Reputation: 20095
A solution can be to use stringr::str_detect
to check presence of Keywords
in each DESCRIPTION
.
library(stringr)
df1$Keywords <- mapply(function(x)paste(df2$Keywords[str_detect(x, tolower(df2$Keywords))],
collapse = ","), df1$DESCRIPTION)
df1
# ID NAME Month DESCRIPTION Keywords
# 1 12 x1 Jan funding recived FUND
# 2 23 x2 Feb author of the book Author,book
# 3 14 x3 Mar new year event New,EVENT
Data:
df1 <- read.table(text =
"ID NAME Month DESCRIPTION
12 x1 Jan 'funding recived'
23 x2 Feb 'author of the book'
14 x3 Mar 'new year event'",
header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text =
"Keywords
New
FUND
EVENT
Author
book",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 887241
We can use regex_left_join
from fuzzyjoin
and do a group_by
concatenation (paste
)
library(fuzzyjoin)
library(dplyr)
df1 %>%
regex_left_join(df2, by = c('DESCRIPTION' = 'Keywords'),
ignore_case = TRUE) %>%
group_by(ID, NAME, Month, DESCRIPTION) %>%
summarise(Keywords = toString(unique(tolower(Keywords))))
# A tibble: 3 x 5
# Groups: ID, NAME, Month [?]
# ID NAME Month DESCRIPTION Keywords
# <int> <chr> <chr> <chr> <chr>
#1 12 x1 Jan funding recived fund
#2 14 x3 Mar new year event new, event
#3 23 x2 Feb author of the book author, book
df1 <- structure(list(ID = c(12L, 23L, 14L), NAME = c("x1", "x2", "x3"
), Month = c("Jan", "Feb", "Mar"), DESCRIPTION = c("funding recived",
"author of the book", "new year event")), .Names = c("ID", "NAME",
"Month", "DESCRIPTION"), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(Keywords = c("New", "FUND", "EVENT", "Author",
"book")), .Names = "Keywords", class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 4