Nicholas
Nicholas

Reputation: 3737

R - Counting total occurrences of word from list that appears in data frame, and grouped

I have a data frame like this:

ID   Word
1    Tree
1    House
1    Tree
2    Snail
2    Tree
3    Car

And I have a list of keywords I want to check for:

(House, Tree, Bird)

I want to know how many times for each ID, any word in my list of keywords appears.

I.e. the word House, Tree or Bird appears 3 times in ID(1), and House, Tree or Bird appears only once in ID(2), and there are no occurrences in ID(3)

ID   Count
1     3
2     1
3     0

I am not sure how to tackle this. I know how to count the number of times a word appears within each ID, but not how many times the words from another list appear.

Thank you for any suggestions/guidance etc.

Upvotes: 1

Views: 48

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389225

In base R, we can use table to count number of Word that has vec in it for each ID.

table(df$ID, df$Word %in% vec)

#    FALSE TRUE
#  1     0    3
#  2     1    1
#  3     1    0

Here rownames (1, 2, 3) are the ID's FALSE is the count of number of Word that are not present in vec for each ID whereas TRUE is the count of number of Word that are present for each ID.

To get the exact expected output we can convert the table to dataframe and take only the TRUEcolumn as count.

data <- as.data.frame.matrix(table(df$ID, df$Word %in% vec))
data.frame(ID = rownames(data), count = data$`TRUE`)

#  ID count
#1  1     3
#2  2     1
#3  3     0

Upvotes: 1

akrun
akrun

Reputation: 887741

We can create a logical index and get the sum grouped by 'ID'. Not sure whether the 'v1' is vector or list (if it is list, then unlist(v1) and use it with the same code)

library(dplyr)
df1 %>% 
   group_by(ID) %>% 
   summarise(Count = sum(Word %in% v1))
# A tibble: 3 x 2
#     ID Count
#  <int> <int>
#1     1     3
#2     2     1
#3     3     0

Or filter and then do a count

df1 %>% 
   filter(Word %in% v1) %>%
   count(ID, .drop = FALSE)

data

v1 <- c("House", "Tree", "Bird")
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L), Word = c("Tree", 
"House", "Tree", "Snail", "Tree", "Car")), class = "data.frame", 
row.names = c(NA, 
-6L))

Upvotes: 2

Related Questions