spidermarn
spidermarn

Reputation: 939

How to filter data as long as a group contains certain values?

I have a data set like this:

VisitID | Item |
1       | A    |
1       | B    |
1       | C    |
1       | D    |
2       | A    |
2       | D    |
2       | B    |
3       | B    |
3       | C    |
4       | D    |
4       | C    |

I want to return VisitID rows as long as that VisitID had a occurrence of item A OR B. How do I go about? Expected Result:

VisitID | Item |
1       | A    |
1       | B    |
1       | C    |
1       | D    |
2       | A    |
2       | D    |
2       | B    |
3       | B    |
3       | C    |

Upvotes: 0

Views: 95

Answers (4)

cbo
cbo

Reputation: 1763

Another base solution :.

groups <- rowsum(x = as.numeric( df$Item %in% c("A", "B") ), group = df$VisitID)
df[df$VisitID %in% rownames(groups)[groups > 0] , ]

Upvotes: 1

Darren Tsai
Darren Tsai

Reputation: 35554

Another filtering condition without grouping the data:

library(dplyr)

df %>%
  filter(VisitID %in% VisitID[Item %in% c('A', 'B')])

#   VisitID Item
# 1       1    A
# 2       1    B
# 3       1    C
# 4       1    D
# 5       2    A
# 6       2    D
# 7       2    B
# 8       3    B
# 9       3    C

Upvotes: 2

SimeonL
SimeonL

Reputation: 77

In base R, I would do:

df <- data.frame(VisitID = c(1,1,1,1,2,2,2,3,3,4,4), 
                 Item = c("A", "B", "C", "D", "A", "D", "B", "B", "C", "D", "C"))

do.call("rbind", lapply(split(df, df$VisitID), function(x) if(any(x[,2] %in% c("A", "B"))) x else NULL))

# VisitID Item
# 1.1       1    A
# 1.2       1    B
# 1.3       1    C
# 1.4       1    D
# 2.5       2    A
# 2.6       2    D
# 2.7       2    B
# 3.8       3    B
# 3.9       3    C

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388807

We can select groups where any Item has either 'A' or 'B'.

library(dplyr)
df %>% group_by(VisitID) %>% filter(any(Item %in% c('A', 'B')))

#  VisitID Item 
#    <int> <fct>
#1       1 A    
#2       1 B    
#3       1 C    
#4       1 D    
#5       2 A    
#6       2 D    
#7       2 B    
#8       3 B    
#9       3 C    

Or in base R :

subset(df, ave(Item %in% c('A', 'B'), VisitID, FUN = any))

and in data.table

library(data.table)
setDT(df)[, .SD[any(Item %in% c('A', 'B'))], VisitID]

data

df <- structure(list(VisitID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
4L, 4L), Item = structure(c(1L, 2L, 3L, 4L, 1L, 4L, 2L, 2L, 3L, 
4L, 3L), .Label = c("A", "B", "C", "D"), class = "factor")), 
row.names = c(NA, -11L), class = "data.frame")

Upvotes: 2

Related Questions