Searching a value in multiple columns within group and subset if true

Question

I aim to subset groups within groups, where cond==1 and other conditions ==3 three times. So, first, I look at group within id. Next, I look at cond1==1 only. If cond1==1, I count how many conditions take value 3. If I see three conditions take value 3 then I take that entire group but within id.

Here is df:

df = data.frame(id = c(rep(450, 4),rep(500, 4)),
                group = c(rep(1, 2), rep(2,2)),
                cond1 = c(1,2,1,2,1,2,2,1),
                cond2 = c(1,3,3,1,3,2,3,1),
                cond3 = c(3,3,1,2,3,1,2,3),
                cond4 = c(3,2,2,3,3,1,2,3),
                cond5 = c(3,2,3,3,3,1,2,3))

Here is how df looks like:

   id group cond1 cond2 cond3 cond4 cond5
1 450     1     1     1     3     3     3
2 450     1     2     3     3     2     2
3 450     2     1     3     1     2     3
4 450     2     2     1     2     3     3
5 500     1     1     3     3     3     3
6 500     1     2     2     1     1     1
7 500     2     2     3     2     2     2
8 500     2     1     1     3     3     3

For instance, from that data frame, group 1 for id==450 is eligible for subsetting, because in the same row, where cond==1, other conditions take value 3 at least three times. Also, group 2 for id==500 also has cond==1 and at least three values 3 in the row. The order does not matter.

Here how the result should look:


   id group cond1 cond2 cond3 cond4 cond5
1 450     1     1     1     3     3     3
2 450     1     2     3     3     2     2
7 500     2     2     3     2     2     2
8 500     2     1     1     3     3     3

akrun · Accepted Answer

Or in dplyr

library(dplyr)
df %>%
  mutate(ind = rowSums(select(., cond2:cond5) == 3) == 3) %>% 
  group_by(id, group) %>% 
  filter(if(any(ind & cond1 == 1)) all(ind[cond1==1]) else FALSE)%>%
  ungroup %>%
  select(-ind)
# A tibble: 4 x 7
#    id group cond1 cond2 cond3 cond4 cond5
#        
#1   450     1     1     1     3     3     3
#2   450     1     2     3     3     2     2
#3   500     2     2     3     2     2     2
#4   500     2     1     1     3     3     3

Searching a value in multiple columns within group and subset if true

Answers (2)

Related Questions