How do I search the number of occurrences of individual words in text data?

Question

How do I find the count of occurrences of a list of words? I can search for one word as follows:

dplyr::filter(data, grepl("apple", data$content,ignore.case = TRUE))
length(x$content)

The |separator allows me to sum up all occurrences. But I want to count each word individually.

The words could be supplied as a row in a csv or written as a vector in R itself, e.g.:

words <- c("apple","orange","pear","pineapple")

One wrinkle is that the data$count are a column of tweets so the word can occur more than once per tweet. So I'd like to count only if they occur in the row.

Nate · Accepted Answer

You could get logical values for the presence/absence of your target words like this:

library(tidyverse)

words <- c("apple","orange","pear","pineapple")

data <- tibble(content = c("Ony my grocery list are green apples, red apples and oranges",
                           "My favorite froyo flavors are pineapple, peach-pear and pear"))

boundary_words <- paste0("\b", words) # if you want to avoid counting the apple in pineapple

map_dfc(boundary_words, ~ as.tibble(grepl(., data$content))) %>%
    set_names(words) %>%
    bind_cols(data, .)

# A tibble: 2 x 5
                                                       content apple orange  pear pineapple
                                                                  
1 Ony my grocery list are green apples, red apples and oranges  TRUE   TRUE FALSE     FALSE
2 My favorite froyo flavors are pineapple, peach-pear and pear FALSE  FALSE  TRUE      TRUE

How do I search the number of occurrences of individual words in text data?

Answers (2)

Related Questions