Reputation: 41
I have a data frame in R like the following:
Group.ID status
1 1 open
2 1 open
3 2 open
4 2 closed
5 2 closed
6 3 open
I want to count the number of IDs under the condition: when all status are "open" for same ID number. For example, Group ID 1 has two observations, and their status are both "open", so that's one for my count. Group ID 2 is not because not all status are open for group ID 2.
I can count the rows or the group IDs under conditions. However I don't know how to apply "all status equal to one value for a group" logic.
DATA.
df1 <-
structure(list(Group.ID = c(1, 1, 2, 2, 2, 3), status = structure(c(2L,
2L, 2L, 1L, 1L, 2L), .Label = c("closed", "open"), class = "factor")), .Names = c("Group.ID",
"status"), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 4
Views: 67
Reputation: 10352
a dplyr
solution:
library(dplyr)
df1 %>%
group_by(Group.ID) %>%
filter(cumsum(status == "open") == 2) %>%
nrow()
Upvotes: 0
Reputation: 76402
Here are two solutions, both using base R
, one more complicated with aggregate
and the other with tapply
. If you just want the total count of Group.ID
matching you request, I suggest that you use the second solution.
agg <- aggregate(status ~ Group.ID, df1, function(x) as.integer(all(x == "open")))
sum(agg$status)
#[1] 2
sum(tapply(df1$status, df1$Group.ID, FUN = function(x) all(x == "open")))
#[1] 2
Upvotes: 1