Laura
Laura

Reputation: 306

Searching a value in multiple columns within group and subset if true

I aim to subset groups within groups, where cond==1 and other conditions ==3 three times. So, first, I look at group within id. Next, I look at cond1==1 only. If cond1==1, I count how many conditions take value 3. If I see three conditions take value 3 then I take that entire group but within id.

Here is df:

df = data.frame(id = c(rep(450, 4),rep(500, 4)),
                group = c(rep(1, 2), rep(2,2)),
                cond1 = c(1,2,1,2,1,2,2,1),
                cond2 = c(1,3,3,1,3,2,3,1),
                cond3 = c(3,3,1,2,3,1,2,3),
                cond4 = c(3,2,2,3,3,1,2,3),
                cond5 = c(3,2,3,3,3,1,2,3))

Here is how df looks like:

   id group cond1 cond2 cond3 cond4 cond5
1 450     1     1     1     3     3     3
2 450     1     2     3     3     2     2
3 450     2     1     3     1     2     3
4 450     2     2     1     2     3     3
5 500     1     1     3     3     3     3
6 500     1     2     2     1     1     1
7 500     2     2     3     2     2     2
8 500     2     1     1     3     3     3

For instance, from that data frame, group 1 for id==450 is eligible for subsetting, because in the same row, where cond==1, other conditions take value 3 at least three times. Also, group 2 for id==500 also has cond==1 and at least three values 3 in the row. The order does not matter.

Here how the result should look:


   id group cond1 cond2 cond3 cond4 cond5
1 450     1     1     1     3     3     3
2 450     1     2     3     3     2     2
7 500     2     2     3     2     2     2
8 500     2     1     1     3     3     3

Upvotes: 2

Views: 250

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101663

Here is a base R option

r <- do.call(
    rbind,
    lapply(
        split(df, df[c("id", "group")]),
        function(v) subset(v, rowSums(v[-c(1:3)] == 3) >= 3)
    )
)
dfout <- `row.names<-`(r[order(r$id,r$group),],NULL)

such that

> df
   id group cond1 cond2 cond3 cond4 cond5
1 450     1     1     1     3     3     3
2 450     1     2     3     3     2     2
3 450     2     1     3     1     2     3
4 450     2     2     1     2     3     3
5 500     1     1     3     3     3     3
6 500     1     2     2     1     1     1
7 500     2     2     3     2     2     2
8 500     2     1     1     3     3     3

If you want to keep the row number from the original data frame, you can try

r <- Reduce(
    rbind,
    lapply(
        split(df, df[c("id", "group")]),
        function(v) subset(v, rowSums(v[-c(1:3)] == 3) >= 3)
    )
)
dfout <- r[order(as.integer(rownames(r))), ]

which gives

> dfout
   id group cond1 cond2 cond3 cond4 cond5
1 450     1     1     1     3     3     3
3 450     2     1     3     3     2     3
5 500     1     1     3     3     3     3
8 500     2     1     1     3     3     3

Upvotes: 1

akrun
akrun

Reputation: 887213

Or in dplyr

library(dplyr)
df %>%
  mutate(ind = rowSums(select(., cond2:cond5) == 3) == 3) %>% 
  group_by(id, group) %>% 
  filter(if(any(ind & cond1 == 1)) all(ind[cond1==1]) else FALSE)%>%
  ungroup %>%
  select(-ind)
# A tibble: 4 x 7
#    id group cond1 cond2 cond3 cond4 cond5
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1   450     1     1     1     3     3     3
#2   450     1     2     3     3     2     2
#3   500     2     2     3     2     2     2
#4   500     2     1     1     3     3     3

Upvotes: 1

Related Questions