Avi
Avi

Reputation: 2283

Removing rows from data frame below threshold

I have the following dataframe (data6):

data6

   n   S_ID      EID VO
1: 1   41883100   1 A1
2: 2   41883100   2 B22
3: 3   41883100   3 C13
4: 4   41883100   4 D18
5: 5   41883100   5 T5-7
6: 6   41883098   1 HJ89
7: 7   41883098   2 I982
8: 8   41884555   1 ZX567
9: 9   41997896   1 TYU12

I would like to keep in data6 all the rows that their maximal EID column values is greater than 2 per S_ID (deleting values of EID per S_ID that is 1 or 2). So the result will be as followed:

data6

   n   S_ID      EID VO
1: 1   41883100   1 A1
2: 2   41883100   2 B22
3: 3   41883100   3 C13
4: 4   41883100   4 D18
5: 5   41883100   5 T5-7

Rows 6 and 7 were deleted since for their S_ID the maximal EID was 2. Row 8 and row 9 were deleted since for each of their S_ID the maximal EID value was 1. Rows 1 to 5 are kept since the maximal value for their S_ID is 5 (in row 5) so all their relevant rows are kept (1 to 5).

Upvotes: 1

Views: 2118

Answers (2)

Tensibai
Tensibai

Reputation: 15784

On base R:

data6[data6$S_ID %in% data6$S_ID[data6$EID>2],]

Inner to outer the idea is to

  1. get the EID > 2 with data6$EID>2
  2. get the corresponding S_ID with data6$S_ID[<1 above>]
  3. get the logical vector corresponding to this <2> data6$S_ID %in% <2>
  4. filter the original dataframe byt the logical vector get on <3>

Upvotes: 5

akrun
akrun

Reputation: 886948

Grouped by 'S_ID', if any of the 'EID' is greater than 2, we get the Subset of Data.table (.SD)

library(data.table)
setDT(data6)[, if(any(EID > 2)) .SD , by = S_ID]
#       S_ID n EID   VO
#1: 41883100 1   1   A1
#2: 41883100 2   2  B22
#3: 41883100 3   3  C13
#4: 41883100 4   4  D18
#5: 41883100 5   5 T5-7

Upvotes: 4

Related Questions