Reputation: 93
I have got a dataset called colours, in which I am interested to find some keywords based on a list (colouryellow , colourblue, colourwhite) that I have created. This is an example of the dataset:
USER | MESSAGE |
---|---|
23456 | The colouryellow is very bright! |
31245 | Most girls like colourpink |
99999 | I am having a break |
9877 | The colouryellow is like the sun |
Is there a way where I can obtain the number of times each keywords based on the list appear on the column MESSAGE? For example, the output would be like:
Keyword | Frequency of Keywords |
---|---|
colouryellow | 2 |
colourblue | 0 |
colourwhite | 0 |
I have tried the following code but it does not provide me the frequency for each keyword, instead displays them together.
colour= read.csv("C: xxxxxx")
keywordcount= dplyr::filter(colour, grepl("colouryellow|colourblue|colourwhite, MESSAGE))
Thank you in advance.
Upvotes: 1
Views: 72
Reputation: 160597
Some things you can do.
some_colours <- c("colouryellow", "colourblue", "colourwhite")
some_col_regex <- paste0("\\b(", paste(some_colours, collapse = "|"), ")\\b")
grepl(some_col_regex, colour$MESSAGE)
# [1] TRUE FALSE FALSE TRUE
lengths(regmatches(colour$MESSAGE, gregexpr(some_col_regex, colour$MESSAGE)))
# [1] 1 0 0 1
table(unlist(regmatches(colour$MESSAGE, gregexpr(some_col_regex, colour$MESSAGE))))
# colouryellow
# 2
Data
colour <- structure(list(USER = c(23456L, 31245L, 99999L, 9877L), MESSAGE = c("The colouryellow is very bright!", "Most girls like colourpink", "I am having a break", "The colouryellow is like the sun")), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1