french_fries
french_fries

Reputation: 1

Subset dataset in specific way

I have a dataset:

x     y
A1  start
A2  ID
A3  delete
A4  start
A5  ID
A6  delete
A7  ID
A8  delete
A9  start
A10 ID
A11 delete
A12 delete
A13 start
A14 ID
A15 start
A16 delete
A17 ID
A18 delete
A19 delete

As you see there is a connection in y column: "delete" comes after "ID". However there are exceptions: in A12 "delete" comes after "delete" and in A16 "delete" comes after "start" and in A19 "delete" comes after "delete". How could i subset only these"deletes" which don't come after "ID". So desired result is:

x     y
A12 delete
A16 delete
A19 delete

Upvotes: 1

Views: 48

Answers (2)

GKi
GKi

Reputation: 39657

In base you can subset with a boolean vector indicating when a delete did not come after ID.

z[c(FALSE, z$y[-nrow(z)] != "ID" & z$y[-1] == "delete"),]
#     x      y
#12 A12 delete
#16 A16 delete
#19 A19 delete

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388982

You can use lag in dplyr :

library(dplyr)
df %>%  filter(y == 'delete' & lag(y) != 'ID')

#    x      y
#1 A12 delete
#2 A16 delete
#3 A19 delete

and equivalent in data.table :

library(data.table)
setDT(df)[y == 'delete' & shift(y) != 'ID']

Upvotes: 2

Related Questions