Reputation: 41
I have a data frame with a review and text column with multiple rows. I also have a list containing words. I want a for loop to examine each row of the data frame to sum the number of words found in the from the list. I want to keep each row sum separated by the row and place the results into a new result data frame.
#Data Frame
Review Text
1 I like to run and play.
2 I eat cookies.
3 I went to swim in the pool.
4 I like to sleep.
5 I like to run, play, swim, and eat.
#List Words
Run
Play
Eat
Swim
#Result Data Frame
Review Count
1 2
2 1
3 1
4 0
5 4
Upvotes: 0
Views: 635
Reputation: 102599
Here is a solution for base R
, where gregexpr
is used for counting occurences.
Given the pattern as below
pat <- c("Run", "Play", "Eat", "Swim")
then the counts added to the data frame can be made via:
df$Count <- sapply(gregexpr(paste0(tolower(pat),collapse = "|"),tolower(df$Text)),
function(v) ifelse(-1 %in% v, 0,length(v)))
such that
> df
Review Text Count
1 1 I like to run and play 2
2 2 I eat cookies 1
3 3 I went to swim in the pool. 1
4 4 I like to sleep. 0
5 5 I like to run, play, swim, and eat. 4
Upvotes: 1
Reputation: 5798
Base R solution (note this solution is intentionally case insensitive):
# Create a vector of patterns to search for:
patterns <- c("Run", "Play", "Eat", "Swim")
# Split on the review number, apply a term counting function (for each review number):
df$term_count <- sapply(split(df, df$Review),
function(x){length(grep(paste0(tolower(patterns), collapse = "|"),
tolower(unlist(strsplit(x$Text, "\\s+")))))})
Data:
df <- data.frame(Review = 1:5, Text = as.character(c("I like to run and play",
"I eat cookies",
"I went to swim in the pool.",
"I like to sleep.",
"I like to run, play, swim, and eat.")),
stringsAsFactors = FALSE)
Upvotes: 0
Reputation: 389225
We can use stringr::str_count
after pasting the words
together as one pattern.
df$Count <- stringr::str_count(df$Text,
paste0("\\b", tolower(words), "\\b", collapse = "|"))
df
# Review Text Count
#1 1 I like to run and play. 2
#2 2 I eat cookies. 1
#3 3 I went to swim in the pool. 1
#4 4 I like to sleep. 0
#5 5 I like to run, play, swim, and eat. 4
data
df <- structure(list(Review = 1:5, Text = structure(c(2L, 1L, 5L, 4L,
3L), .Label = c("I eat cookies.", "I like to run and play.",
"I like to run, play, swim, and eat.", "I like to sleep.",
"I went to swim in the pool."), class = "factor")), class =
"data.frame", row.names = c(NA, -5L))
words <- c("Run","Play","Eat","Swim")
Upvotes: 0