LDT
LDT

Reputation: 3108

Remove ID's which have a zero value in more than "n" time points with dplyr R

My data frame looks like this

value <- c(0,0.1,0.2,0.4,0,0.05,0.05,0.5,0.20,0.40,0.50,0.60)
time <- c(0,0,0,0,1,1,1,1,2,2,2,2)
ID <- c(1,2,3,4,1,2,3,4,1,2,3,4)

test <- data.frame(value, time, ID)
test

   value time ID
1   0.00    0  1
2   0.10    0  2
3   0.20    0  3
4   0.40    0  4
5   0.00    1  1
6   0.05    1  2
7   0.05    1  3
8   0.50    1  4
9   0.20    2  1
10  0.40    2  2
11  0.50    2  3
12  0.60    2  4

I would like to remove all ID's that have a value==0 for more or equal to two time points I would like my data frame to look like this,(removing ID=1 which has two time point a value=0)

2   0.10    0  2
3   0.20    0  3
4   0.40    0  4
6   0.05    1  2
7   0.05    1  3
8   0.50    1  4
10  0.40    2  2
11  0.50    2  3
12  0.60    2  4

Upvotes: 1

Views: 99

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389065

In base R, we can use subset with ave :

n <- 2
subset(test, ave(value == 0, ID, FUN = sum) < n)

#   value time ID
#2   0.10    0  2
#3   0.20    0  3
#4   0.40    0  4
#6   0.05    1  2
#7   0.05    1  3
#8   0.50    1  4
#10  0.40    2  2
#11  0.50    2  3
#12  0.60    2  4

Or in data.table :

library(data.table)
setDT(test)[, .SD[sum(value == 0) < n], ID]

Upvotes: 1

akrun
akrun

Reputation: 887391

We could use filter with a logical condition by getting the count on the expression value == 0 with sum and then convert it to logical with it being less than 2 (after grouping by 'ID')

library(dplyr)
test %>%
    group_by(ID) %>%
    filter(sum(value == 0) <2)

Upvotes: 2

Related Questions