Louise
Louise

Reputation: 93

Count the number of keywords based on a list

I have got a dataset called colours, in which I am interested to find some keywords based on a list (colouryellow , colourblue, colourwhite) that I have created. This is an example of the dataset:

USER MESSAGE
23456 The colouryellow is very bright!
31245 Most girls like colourpink
99999 I am having a break
9877 The colouryellow is like the sun

Is there a way where I can obtain the number of times each keywords based on the list appear on the column MESSAGE? For example, the output would be like:

Keyword Frequency of Keywords
colouryellow 2
colourblue 0
colourwhite 0

I have tried the following code but it does not provide me the frequency for each keyword, instead displays them together.

   colour= read.csv("C: xxxxxx")

   keywordcount= dplyr::filter(colour, grepl("colouryellow|colourblue|colourwhite, MESSAGE))

Thank you in advance.

Upvotes: 1

Views: 72

Answers (1)

r2evans
r2evans

Reputation: 160597

Some things you can do.

some_colours <- c("colouryellow", "colourblue", "colourwhite")
some_col_regex <- paste0("\\b(", paste(some_colours, collapse = "|"), ")\\b")
grepl(some_col_regex, colour$MESSAGE)
# [1]  TRUE FALSE FALSE  TRUE
lengths(regmatches(colour$MESSAGE, gregexpr(some_col_regex, colour$MESSAGE)))
# [1] 1 0 0 1
table(unlist(regmatches(colour$MESSAGE, gregexpr(some_col_regex, colour$MESSAGE))))
# colouryellow 
#            2 

Data

colour <- structure(list(USER = c(23456L, 31245L, 99999L, 9877L), MESSAGE = c("The colouryellow is very bright!", "Most girls like colourpink", "I am having a break", "The colouryellow is like the sun")), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

Related Questions