Reputation: 2283
I have the following dataframe (data6):
data6
n S_ID EID VO
1: 1 41883100 1 A1
2: 2 41883100 2 B22
3: 3 41883100 3 C13
4: 4 41883100 4 D18
5: 5 41883100 5 T5-7
6: 6 41883098 1 HJ89
7: 7 41883098 2 I982
8: 8 41884555 1 ZX567
9: 9 41997896 1 TYU12
I would like to keep in data6 all the rows that their maximal EID column values is greater than 2 per S_ID (deleting values of EID per S_ID that is 1 or 2). So the result will be as followed:
data6
n S_ID EID VO
1: 1 41883100 1 A1
2: 2 41883100 2 B22
3: 3 41883100 3 C13
4: 4 41883100 4 D18
5: 5 41883100 5 T5-7
Rows 6 and 7 were deleted since for their S_ID the maximal EID was 2. Row 8 and row 9 were deleted since for each of their S_ID the maximal EID value was 1. Rows 1 to 5 are kept since the maximal value for their S_ID is 5 (in row 5) so all their relevant rows are kept (1 to 5).
Upvotes: 1
Views: 2118
Reputation: 15784
On base R:
data6[data6$S_ID %in% data6$S_ID[data6$EID>2],]
Inner to outer the idea is to
data6$EID>2
data6$S_ID[<1 above>]
data6$S_ID %in% <2>
Upvotes: 5
Reputation: 886948
Grouped by 'S_ID', if
any
of the 'EID' is greater than 2, we get the Subset of Data.table (.SD
)
library(data.table)
setDT(data6)[, if(any(EID > 2)) .SD , by = S_ID]
# S_ID n EID VO
#1: 41883100 1 1 A1
#2: 41883100 2 2 B22
#3: 41883100 3 3 C13
#4: 41883100 4 4 D18
#5: 41883100 5 5 T5-7
Upvotes: 4