Reputation: 3108
My data frame looks like this
value <- c(0,0.1,0.2,0.4,0,0.05,0.05,0.5,0.20,0.40,0.50,0.60)
time <- c(0,0,0,0,1,1,1,1,2,2,2,2)
ID <- c(1,2,3,4,1,2,3,4,1,2,3,4)
test <- data.frame(value, time, ID)
test
value time ID
1 0.00 0 1
2 0.10 0 2
3 0.20 0 3
4 0.40 0 4
5 0.00 1 1
6 0.05 1 2
7 0.05 1 3
8 0.50 1 4
9 0.20 2 1
10 0.40 2 2
11 0.50 2 3
12 0.60 2 4
I would like to remove all ID's that have a value==0 for more or equal to two time points I would like my data frame to look like this,(removing ID=1 which has two time point a value=0)
2 0.10 0 2
3 0.20 0 3
4 0.40 0 4
6 0.05 1 2
7 0.05 1 3
8 0.50 1 4
10 0.40 2 2
11 0.50 2 3
12 0.60 2 4
Upvotes: 1
Views: 99
Reputation: 389065
In base R, we can use subset
with ave
:
n <- 2
subset(test, ave(value == 0, ID, FUN = sum) < n)
# value time ID
#2 0.10 0 2
#3 0.20 0 3
#4 0.40 0 4
#6 0.05 1 2
#7 0.05 1 3
#8 0.50 1 4
#10 0.40 2 2
#11 0.50 2 3
#12 0.60 2 4
Or in data.table
:
library(data.table)
setDT(test)[, .SD[sum(value == 0) < n], ID]
Upvotes: 1
Reputation: 887391
We could use filter
with a logical condition by getting the count on the expression value == 0
with sum
and then convert it to logical with it being less than 2 (after grouping by 'ID')
library(dplyr)
test %>%
group_by(ID) %>%
filter(sum(value == 0) <2)
Upvotes: 2