Jaylon Aaron
Jaylon Aaron

Reputation: 41

How do I count the number of words from a list mentioned in a data frame in R

I have a data frame with a review and text column with multiple rows. I also have a list containing words. I want a for loop to examine each row of the data frame to sum the number of words found in the from the list. I want to keep each row sum separated by the row and place the results into a new result data frame.

#Data Frame
Review           Text
1           I like to run and play.
2           I eat cookies.
3           I went to swim in the pool.
4           I like to sleep.
5           I like to run, play, swim, and eat.

#List Words
Run
Play
Eat
Swim

#Result Data Frame
Review      Count
1            2
2            1
3            1
4            0
5            4

Upvotes: 0

Views: 635

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102599

Here is a solution for base R, where gregexpr is used for counting occurences.

Given the pattern as below

pat <- c("Run", "Play", "Eat", "Swim")

then the counts added to the data frame can be made via:

df$Count <- sapply(gregexpr(paste0(tolower(pat),collapse = "|"),tolower(df$Text)), 
                   function(v) ifelse(-1 %in% v, 0,length(v)))

such that

> df
  Review                                Text Count
1      1              I like to run and play     2
2      2                       I eat cookies     1
3      3         I went to swim in the pool.     1
4      4                    I like to sleep.     0
5      5 I like to run, play, swim, and eat.     4

Upvotes: 1

hello_friend
hello_friend

Reputation: 5798

Base R solution (note this solution is intentionally case insensitive):

# Create a vector of patterns to search for: 

patterns <- c("Run", "Play", "Eat", "Swim")

# Split on the review number, apply a term counting function (for each review number): 

df$term_count <- sapply(split(df, df$Review), 

                        function(x){length(grep(paste0(tolower(patterns), collapse = "|"),

                               tolower(unlist(strsplit(x$Text, "\\s+")))))})

Data:

df <- data.frame(Review = 1:5, Text = as.character(c("I like to run and play",
                                                     "I eat cookies",
                                                     "I went to swim in the pool.",
                                                     "I like to sleep.", 
                                                     "I like to run, play, swim, and eat.")), 
                 stringsAsFactors = FALSE)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389225

We can use stringr::str_count after pasting the words together as one pattern.

df$Count <- stringr::str_count(df$Text, 
                   paste0("\\b", tolower(words), "\\b", collapse = "|"))

df
#  Review                                Text Count
#1      1             I like to run and play.     2
#2      2                      I eat cookies.     1
#3      3         I went to swim in the pool.     1
#4      4                    I like to sleep.     0
#5      5 I like to run, play, swim, and eat.     4

data

df <- structure(list(Review = 1:5, Text = structure(c(2L, 1L, 5L, 4L, 
3L), .Label = c("I eat cookies.", "I like to run and play.", 
"I like to run, play, swim, and eat.", "I like to sleep.", 
"I went to swim in the pool."), class = "factor")), class = 
"data.frame", row.names = c(NA, -5L))
words <- c("Run","Play","Eat","Swim")

Upvotes: 0

Related Questions