Count the number of keywords based on a list

Question

I have got a dataset called colours, in which I am interested to find some keywords based on a list (colouryellow , colourblue, colourwhite) that I have created. This is an example of the dataset:

USER	MESSAGE
23456	The colouryellow is very bright!
31245	Most girls like colourpink
99999	I am having a break
9877	The colouryellow is like the sun

Is there a way where I can obtain the number of times each keywords based on the list appear on the column MESSAGE? For example, the output would be like:

Keyword	Frequency of Keywords
colouryellow	2
colourblue	0
colourwhite	0

I have tried the following code but it does not provide me the frequency for each keyword, instead displays them together.

   colour= read.csv("C: xxxxxx")

   keywordcount= dplyr::filter(colour, grepl("colouryellow|colourblue|colourwhite, MESSAGE))

Thank you in advance.

r2evans · Accepted Answer

Some things you can do.

some_colours <- c("colouryellow", "colourblue", "colourwhite")
some_col_regex <- paste0("\b(", paste(some_colours, collapse = "|"), ")\b")
grepl(some_col_regex, colour$MESSAGE)
# [1]  TRUE FALSE FALSE  TRUE
lengths(regmatches(colour$MESSAGE, gregexpr(some_col_regex, colour$MESSAGE)))
# [1] 1 0 0 1
table(unlist(regmatches(colour$MESSAGE, gregexpr(some_col_regex, colour$MESSAGE))))
# colouryellow 
#            2

Data

colour <- structure(list(USER = c(23456L, 31245L, 99999L, 9877L), MESSAGE = c("The colouryellow is very bright!", "Most girls like colourpink", "I am having a break", "The colouryellow is like the sun")), class = "data.frame", row.names = c(NA, -4L))

Count the number of keywords based on a list

Answers (1)

Related Questions