Reputation: 3737
I have a data frame like this:
ID Word
1 Tree
1 House
1 Tree
2 Snail
2 Tree
3 Car
And I have a list of keywords I want to check for:
(House, Tree, Bird)
I want to know how many times for each ID, any word in my list of keywords appears.
I.e. the word House, Tree or Bird appears 3 times in ID(1), and House, Tree or Bird appears only once in ID(2), and there are no occurrences in ID(3)
ID Count
1 3
2 1
3 0
I am not sure how to tackle this. I know how to count the number of times a word appears within each ID, but not how many times the words from another list appear.
Thank you for any suggestions/guidance etc.
Upvotes: 1
Views: 48
Reputation: 389225
In base R, we can use table
to count number of Word
that has vec
in it for each ID
.
table(df$ID, df$Word %in% vec)
# FALSE TRUE
# 1 0 3
# 2 1 1
# 3 1 0
Here rownames (1, 2, 3) are the ID's FALSE
is the count of number of Word
that are not present in vec
for each ID
whereas TRUE
is the count of number of Word
that are present for each ID
.
To get the exact expected output we can convert the table to dataframe and take only the TRUE
column as count.
data <- as.data.frame.matrix(table(df$ID, df$Word %in% vec))
data.frame(ID = rownames(data), count = data$`TRUE`)
# ID count
#1 1 3
#2 2 1
#3 3 0
Upvotes: 1
Reputation: 887741
We can create a logical index and get the sum
grouped by 'ID'. Not sure whether the 'v1' is vector
or list
(if it is list
, then unlist(v1)
and use it with the same code)
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(Count = sum(Word %in% v1))
# A tibble: 3 x 2
# ID Count
# <int> <int>
#1 1 3
#2 2 1
#3 3 0
Or filter
and then do a count
df1 %>%
filter(Word %in% v1) %>%
count(ID, .drop = FALSE)
v1 <- c("House", "Tree", "Bird")
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L), Word = c("Tree",
"House", "Tree", "Snail", "Tree", "Car")), class = "data.frame",
row.names = c(NA,
-6L))
Upvotes: 2