Reputation: 939
I have a data set like this:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
4 | D |
4 | C |
I want to return VisitID rows as long as that VisitID had a occurrence of item A OR B. How do I go about? Expected Result:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
Upvotes: 0
Views: 95
Reputation: 1763
Another base
solution :.
groups <- rowsum(x = as.numeric( df$Item %in% c("A", "B") ), group = df$VisitID)
df[df$VisitID %in% rownames(groups)[groups > 0] , ]
Upvotes: 1
Reputation: 35554
Another filtering condition without grouping the data:
library(dplyr)
df %>%
filter(VisitID %in% VisitID[Item %in% c('A', 'B')])
# VisitID Item
# 1 1 A
# 2 1 B
# 3 1 C
# 4 1 D
# 5 2 A
# 6 2 D
# 7 2 B
# 8 3 B
# 9 3 C
Upvotes: 2
Reputation: 77
In base R, I would do:
df <- data.frame(VisitID = c(1,1,1,1,2,2,2,3,3,4,4),
Item = c("A", "B", "C", "D", "A", "D", "B", "B", "C", "D", "C"))
do.call("rbind", lapply(split(df, df$VisitID), function(x) if(any(x[,2] %in% c("A", "B"))) x else NULL))
# VisitID Item
# 1.1 1 A
# 1.2 1 B
# 1.3 1 C
# 1.4 1 D
# 2.5 2 A
# 2.6 2 D
# 2.7 2 B
# 3.8 3 B
# 3.9 3 C
Upvotes: 4
Reputation: 388807
We can select groups where any
Item
has either 'A'
or 'B'
.
library(dplyr)
df %>% group_by(VisitID) %>% filter(any(Item %in% c('A', 'B')))
# VisitID Item
# <int> <fct>
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 A
#6 2 D
#7 2 B
#8 3 B
#9 3 C
Or in base R :
subset(df, ave(Item %in% c('A', 'B'), VisitID, FUN = any))
and in data.table
library(data.table)
setDT(df)[, .SD[any(Item %in% c('A', 'B'))], VisitID]
data
df <- structure(list(VisitID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
4L, 4L), Item = structure(c(1L, 2L, 3L, 4L, 1L, 4L, 2L, 2L, 3L,
4L, 3L), .Label = c("A", "B", "C", "D"), class = "factor")),
row.names = c(NA, -11L), class = "data.frame")
Upvotes: 2