Reputation: 11
I have a a data frame with one column (7,234 rows) of Youtube video titles. I have a separate list of 71 key words.
I would like to find the frequency of each key word across all 7,234 rows.
Using str_detect
I'm able to find the frequency of each separate key word.
This gives me a logical result when I use summary
:
Mode FALSE TRUE
logical 1462 5772
I am not sure how to use a for loop to do this for all key words though, and how I can put this new data into a new dataframe, with the colnames: Video Title, Freq True, Freq False
Thanks
Upvotes: 0
Views: 37
Reputation: 21432
You don't need a for
loop. Just isolate all words, count them and filter the key words with their frequencies:
Toy data:
words <- c("apple", "pear", "grape")
sentences <- c("I have an apple and a pear",
"Grape is my favorite but I also like apple",
"I don't like pear and I don't like apple or applepie",
"She hates fruit")
library(dplyr)
library(tidyr)
data.frame(sentences) %>%
# separate sentences into single words:
separate_rows(sentences, sep = "\\s") %>%
# convert to lower-case:
mutate(sentences = tolower(sentences)) %>%
group_by(sentences) %>%
# count:
summarise(freq = n()) %>%
filter(sentences %in% words)
# A tibble: 3 x 2
value freq
* <chr> <int>
1 apple 3
2 grape 1
3 pear 2
Upvotes: 0