Removing rows from data frame below threshold

Question

I have the following dataframe (data6):

data6

   n   S_ID      EID VO
1: 1   41883100   1 A1
2: 2   41883100   2 B22
3: 3   41883100   3 C13
4: 4   41883100   4 D18
5: 5   41883100   5 T5-7
6: 6   41883098   1 HJ89
7: 7   41883098   2 I982
8: 8   41884555   1 ZX567
9: 9   41997896   1 TYU12

I would like to keep in data6 all the rows that their maximal EID column values is greater than 2 per S_ID (deleting values of EID per S_ID that is 1 or 2). So the result will be as followed:

data6

   n   S_ID      EID VO
1: 1   41883100   1 A1
2: 2   41883100   2 B22
3: 3   41883100   3 C13
4: 4   41883100   4 D18
5: 5   41883100   5 T5-7

Rows 6 and 7 were deleted since for their S_ID the maximal EID was 2. Row 8 and row 9 were deleted since for each of their S_ID the maximal EID value was 1. Rows 1 to 5 are kept since the maximal value for their S_ID is 5 (in row 5) so all their relevant rows are kept (1 to 5).

akrun · Accepted Answer

Grouped by 'S_ID', if any of the 'EID' is greater than 2, we get the Subset of Data.table (.SD)

library(data.table)
setDT(data6)[, if(any(EID > 2)) .SD , by = S_ID]
#       S_ID n EID   VO
#1: 41883100 1   1   A1
#2: 41883100 2   2  B22
#3: 41883100 3   3  C13
#4: 41883100 4   4  D18
#5: 41883100 5   5 T5-7

Removing rows from data frame below threshold

Answers (2)

Related Questions