R: Match a character vector to text description in dataframe and return value

Question

I have a dataframe with 2 columns and 20 rows of Title of an article and description of that article. I have few key words that I would like to match against these 2 columns. If there is a match with the key word, It should return a Value 1 else 0. I tried simple function like (my.df == "human" ) + 0. However, it does not work as expected as it cannot find exact match, even though there is a word human somewhere in the title. Any suggestions and help is appreciated. Thank you

Below is example:

my.keyword<- c("human", "lung", "mutation", "chromosome")
# sample df. created from web

my.df 
Title                           Description
Atlas of mutations in         Lung cancer is the leading cause of cancer
                              related mortality in the United States, with
                              an estimated 221,200 new cases and 158,040
                              deaths anticipated in 2015 (ACS 2015).

The complexity increases when I would like to search all the char in my.keyword object, without for loop. I would like to get an output if there is a match with human, lung, mutation, chromosome in title...the output result should be 4. If only 3 match out of 4, the result should be 3. Same, in the case of description. Irrespective of repetition of the word, it just should be one value for a match. Thank you

jlhoward · Accepted Answer

my.keyword<- c("human", "lung", "mutation", "chromosome")
txt <- "Human lung cancer due to chromosome mutations is the leading cause of cancer related mortality in the United States, with an estimated 221,200 new cases and 158,040 deaths anticipated in 2015 (ACS 2015). "
count.kw <- function(txt) sum(sapply(my.keyword, grepl, x=tolower(txt), fixed=TRUE))
count.kw(txt)
# [1] 4

Notice how I "edited" your text to include more than one of the keywords

This works for 1 string, but not a vector of strings, so we have to vectorize the function:

vcount.lw <- Vectorize(count.kw)

Then, create an example:

set.seed(1)
rwords <- function(x) paste(paste(my.keyword[sample(1:4,sample(1:4))], collapse= " "),"blah, blah, blah")
df <- data.frame(Title=sapply(1:10,rwords))

and demonstrate the solution.

vcount.lw(df$Title)
#  [1] 2 4 4 3 2 3 4 2 2 2

R: Match a character vector to text description in dataframe and return value

Answers (2)

Related Questions