Reputation: 77
my data frame looks like this
df <- read.table(text="
id date paid_at binded_at
1 107 2016-12-16 2017-06-02 2017-06-07
2 107 2017-11-27 2017-06-02 2017-06-07
3 107 2017-11-28 2017-06-02 2017-06-07
4 109 2016-11-28 2017-01-01 2017-06-07
5 109 2017-11-29 2017-01-01 2017-06-07
6 110 2017-12-04 2018-01-01 2017-06-07", header=TRUE)
The goal is for each id compare paid_at and date colums and filter only closest date than is between paid_at. For example id 107 it was paid 2017-06-02, and since closest date after that is 2017-11-27, second row should be filtered. Same for id 109, closest date after 2017-01-01 is 2017-11-29 so filter it. Issue for me is situation like id 110, since there is not actually date after 2018-01-01, for this should be filtered 6th row. Result should be
result <- read.table(text="
id date paid_at binded_at
1 107 2017-11-27 2017-06-02 2017-06-07
2 109 2017-11-29 2017-01-01 2017-06-07
3 110 2017-12-04 2018-01-01 2017-06-07", header=TRUE)
I created this code yet:
result <- df %>%
group_by(id) %>%
filter(paid_at > date)
Upvotes: 0
Views: 92
Reputation: 20085
An option is to allow id
which got only one row via filter
as for those id
s any condition will not match.
The solution shown by OP
can be extended to include id
with only 1
row as:
library(dplyr)
df %>% mutate_at(vars(2:4), as.Date) %>% #This step is to convert in Date format
group_by(id) %>%
filter(paid_at < date | n()==1) %>% #Include groups with single row
arrange(date) %>%
slice(1) #Select just 1 row
# # A tibble: 3 x 4
# # Groups: id [3]
# id date paid_at binded_at
# <int> <date> <date> <date>
# 1 107 2017-11-27 2017-06-02 2017-06-07
# 2 109 2017-11-29 2017-01-01 2017-06-07
# 3 110 2017-12-04 2018-01-01 2017-06-07
Upvotes: 1