Reputation: 306
I aim to subset groups within groups, where cond==1
and other conditions ==3
three times. So, first, I look at group
within id
. Next, I look at cond1==1
only. If cond1==1
, I count how many conditions take value 3
. If I see three conditions take value 3
then I take that entire group
but within id
.
Here is df
:
df = data.frame(id = c(rep(450, 4),rep(500, 4)),
group = c(rep(1, 2), rep(2,2)),
cond1 = c(1,2,1,2,1,2,2,1),
cond2 = c(1,3,3,1,3,2,3,1),
cond3 = c(3,3,1,2,3,1,2,3),
cond4 = c(3,2,2,3,3,1,2,3),
cond5 = c(3,2,3,3,3,1,2,3))
Here is how df
looks like:
id group cond1 cond2 cond3 cond4 cond5
1 450 1 1 1 3 3 3
2 450 1 2 3 3 2 2
3 450 2 1 3 1 2 3
4 450 2 2 1 2 3 3
5 500 1 1 3 3 3 3
6 500 1 2 2 1 1 1
7 500 2 2 3 2 2 2
8 500 2 1 1 3 3 3
For instance, from that data frame, group 1 for id==450
is eligible for subsetting, because in the same row, where cond==1
, other conditions take value 3 at least three times. Also, group 2 for id==500
also has cond==1
and at least three values 3
in the row. The order does not matter.
Here how the result should look:
id group cond1 cond2 cond3 cond4 cond5
1 450 1 1 1 3 3 3
2 450 1 2 3 3 2 2
7 500 2 2 3 2 2 2
8 500 2 1 1 3 3 3
Upvotes: 2
Views: 250
Reputation: 101663
Here is a base R option
r <- do.call(
rbind,
lapply(
split(df, df[c("id", "group")]),
function(v) subset(v, rowSums(v[-c(1:3)] == 3) >= 3)
)
)
dfout <- `row.names<-`(r[order(r$id,r$group),],NULL)
such that
> df
id group cond1 cond2 cond3 cond4 cond5
1 450 1 1 1 3 3 3
2 450 1 2 3 3 2 2
3 450 2 1 3 1 2 3
4 450 2 2 1 2 3 3
5 500 1 1 3 3 3 3
6 500 1 2 2 1 1 1
7 500 2 2 3 2 2 2
8 500 2 1 1 3 3 3
If you want to keep the row number from the original data frame, you can try
r <- Reduce(
rbind,
lapply(
split(df, df[c("id", "group")]),
function(v) subset(v, rowSums(v[-c(1:3)] == 3) >= 3)
)
)
dfout <- r[order(as.integer(rownames(r))), ]
which gives
> dfout
id group cond1 cond2 cond3 cond4 cond5
1 450 1 1 1 3 3 3
3 450 2 1 3 3 2 3
5 500 1 1 3 3 3 3
8 500 2 1 1 3 3 3
Upvotes: 1
Reputation: 887213
Or in dplyr
library(dplyr)
df %>%
mutate(ind = rowSums(select(., cond2:cond5) == 3) == 3) %>%
group_by(id, group) %>%
filter(if(any(ind & cond1 == 1)) all(ind[cond1==1]) else FALSE)%>%
ungroup %>%
select(-ind)
# A tibble: 4 x 7
# id group cond1 cond2 cond3 cond4 cond5
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 450 1 1 1 3 3 3
#2 450 1 2 3 3 2 2
#3 500 2 2 3 2 2 2
#4 500 2 1 1 3 3 3
Upvotes: 1