Reputation: 107
Back again with a simple issue on paper but struggling with the implementation. For context this is looking at suspects and victims and what we want to achieve is if the current victim is the same as the last victim. If so the suspects latest victim is different from the last that is a entry we would want to flag and keep. If they are the same we would remove the record.
So the comparison is:
Suspect A on Date 1 relates to Victim 1 = Suspect A on Date 2 relates to Victim 1 = Drop
Suspect B on Date 1 relates to Victim 2 = Suspect B on Date 2 relates to Victim 3 = Keep
Date | Suspect | Victim |
---|---|---|
15/01/2022 | A | 1 |
12/03/2022 | A | 1 |
19/02/2022 | B | 2 |
16/01/2022 | B | 3 |
08/03/2022 | B | 4 |
20/03/2022 | B | 5 |
25/01/2022 | C | 5 |
21/02/2022 | D | 6 |
10/01/2022 | D | 7 |
Assume this is my current data set. In this context 'Suspect' should only have two entries B and D while A and C are removed.
I was thinking of a doing an arrange of date and Suspect. Then lagging the comparison. But how does lag work if jumping suspects. Can that be solved with a group variable? This is where I am stuck conceptualising it and fear removing things that should be included.
any help, as always, is greatly appreciated.
Upvotes: 0
Views: 97
Reputation: 160607
Try this:
dat %>%
group_by(Suspect) %>%
filter(n() > 1 & Victim != last(Victim))
# # A tibble: 4 x 3
# # Groups: Suspect [2]
# Date Suspect Victim
# <chr> <chr> <int>
# 1 19/02/2022 B 2
# 2 16/01/2022 B 3
# 3 08/03/2022 B 4
# 4 21/02/2022 D 6
Upvotes: 1